You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2016/06/21 14:30:06 UTC

[Bug 59739] New: Need to expand options in FileInformationBlock.assertCbRgFcLcb

https://bz.apache.org/bugzilla/show_bug.cgi?id=59739

            Bug ID: 59739
           Summary: Need to expand options in
                    FileInformationBlock.assertCbRgFcLcb
           Product: POI
           Version: unspecified
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: major
          Priority: P2
         Component: HWPF
          Assignee: dev@poi.apache.org
          Reporter: tallison@mitre.org

Created attachment 33970
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=33970&action=edit
triggering file

In tightening up the codebase based on FindBugs results[0], we're now throwing
an IllegalStateException if the FileInformationBlock doesn't match one of 5
patterns.

A test file in Tika's test suite is now getting:

Caused by: java.lang.IllegalStateException: Invalid file format version number:
195
    at
org.apache.poi.hwpf.model.FileInformationBlock.assertCbRgFcLcb(FileInformationBlock.java:164)
    at
org.apache.poi.hwpf.model.FileInformationBlock.<init>(FileInformationBlock.java:140)
    at org.apache.poi.hwpf.HWPFDocumentCore.<init>(HWPFDocumentCore.java:163)
    at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:197)
    at
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:144)
    at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:146)
    at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
    ... 34 more

[0]http://svn.apache.org/viewvc/poi/trunk/src/scratchpad/src/org/apache/poi/hwpf/model/FileInformationBlock.java?r1=1557290&r2=1738782

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 59739] Need to expand options in FileInformationBlock.assertCbRgFcLcb

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=59739

Nick Burch <ap...@gagravarr.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO

--- Comment #2 from Nick Burch <ap...@gagravarr.org> ---
Based on the file format specs -
https://msdn.microsoft.com/en-us/library/dd949344(v=office.12).aspx and
https://msdn.microsoft.com/en-us/library/dd950103(v=office.12).aspx - it looks
like we have all the "official" cases covered

Do these files start passing again if you do a save-as from word?

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 59739] Need to expand options in FileInformationBlock.assertCbRgFcLcb

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=59739

--- Comment #4 from Nick Burch <ap...@gagravarr.org> ---
I guess we need to work out what format type these are (maybe by seeing what
they contain, or what Word treats them as?), then add a few more "unofficial"
entries to the check list

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 59739] Need to expand options in FileInformationBlock.assertCbRgFcLcb

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=59739

--- Comment #8 from Dominik Stadler <do...@gmx.at> ---
I have applied a quick fix in r1750864 so that such documents do not fail until
we have decided on a complete solution here. The code now logs out the unknown
version number instead of throwing an exception.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 59739] Need to expand options in FileInformationBlock.assertCbRgFcLcb

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=59739

Dominik Stadler <do...@gmx.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|unspecified                 |3.15-dev

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 59739] Need to expand options in FileInformationBlock.assertCbRgFcLcb

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=59739

--- Comment #3 from Tim Allison <ta...@mitre.org> ---
Y, the problem goes away if I save as.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 59739] Need to expand options in FileInformationBlock.assertCbRgFcLcb

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=59739

Dominik Stadler <do...@gmx.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #10 from Dominik Stadler <do...@gmx.at> ---
I don't think we currently plan to invest more time on this, so I am closing
this as FIXED as all the sample-documents from our commoncrawl-corpus are now
handled correctly.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 59739] Need to expand options in FileInformationBlock.assertCbRgFcLcb

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=59739

--- Comment #9 from Nick Burch <ap...@gagravarr.org> ---
In r1750866, I've changed it so that likely "nearby" values are subject to the
same check as the "official" values, and if passed no logging occurs. Hopefully
this helps and doesn't trigger any new warnings!

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 59739] Need to expand options in FileInformationBlock.assertCbRgFcLcb

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=59739

--- Comment #7 from Nick Burch <ap...@gagravarr.org> ---
Those look like clusters around the valid values (plus an outlier at 113)

I'd be tempted to say we accept any value +- 4 without much/any logging (and +-
113). Could someone check, for a few triggering files, how their cbRgFcLcb
values match with what the spec says to expect for the nearest-valid nFib
values?

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 59739] Need to expand options in FileInformationBlock.assertCbRgFcLcb

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=59739

--- Comment #1 from Tim Allison <ta...@mitre.org> ---
I'll run the full corpus regression tests shortly to see if there are other
values we need to add.  Or, should we undo this check?

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 59739] Need to expand options in FileInformationBlock.assertCbRgFcLcb

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=59739

--- Comment #6 from Dominik Stadler <do...@gmx.at> ---
In my regression-run, the following invalid versions are found:

EXCEPTIONTEXT315BETA2
java.lang.IllegalStateException: Invalid file format version number: 113
java.lang.IllegalStateException: Invalid file format version number: 191
java.lang.IllegalStateException: Invalid file format version number: 192
java.lang.IllegalStateException: Invalid file format version number: 194
java.lang.IllegalStateException: Invalid file format version number: 195
java.lang.IllegalStateException: Invalid file format version number: 216
java.lang.IllegalStateException: Invalid file format version number: 265
java.lang.IllegalStateException: Invalid file format version number: 267


We can add those as "undocumented", but I'd probably go with a
POILogger.warning() instead of the throw to not fail on other version-numbers
that are not included in our test-files.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 59739] Need to expand options in FileInformationBlock.assertCbRgFcLcb

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=59739

--- Comment #5 from Javen O'Neal <on...@apache.org> ---
Reverting any deprecated code deletion I did as part of bug 59170 is also an
acceptable fix here. I was pretty heavy handed with HWPF.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org