You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2016/11/22 21:36:40 UTC

[Bug 60405] New: AIOOBE: -32725 when loading an Excel file that includes some Excel 4.0 macros

https://bz.apache.org/bugzilla/show_bug.cgi?id=60405

            Bug ID: 60405
           Summary: AIOOBE: -32725 when loading an Excel file that
                    includes some Excel 4.0 macros
           Product: POI
           Version: 3.15-FINAL
          Hardware: PC
                OS: Mac OS X 10.1
            Status: NEW
          Severity: major
          Priority: P2
         Component: HSSF
          Assignee: dev@poi.apache.org
          Reporter: martin.oberhuber@gmx.at
  Target Milestone: ---

Created attachment 34467
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=34467&action=edit
MyWB.xls containing Excel 4.0 macro functions

I'm using poi-3.15 from Sourceforge Docfetcher, to index Excel documents (among
others) for searching. On attached document, the following exception is thrown,
making it impossible to access any data of the document; this is a severe issue
that I don't have any workaround for.


java.lang.ArrayIndexOutOfBoundsException: -32725
        at
org.apache.poi.ss.formula.function.FunctionMetadataRegistry.getFunctionByIndexInternal(FunctionMetadataRegistry.java:66)
        at
org.apache.poi.ss.formula.function.FunctionMetadataRegistry.getFunctionByIndex(FunctionMetadataRegistry.java:62)
        at org.apache.poi.ss.formula.ptg.FuncVarPtg.create(FuncVarPtg.java:56)
        at org.apache.poi.ss.formula.ptg.FuncVarPtg.create(FuncVarPtg.java:45)
        at org.apache.poi.ss.formula.ptg.Ptg.createClassifiedPtg(Ptg.java:103)
        at org.apache.poi.ss.formula.ptg.Ptg.createPtg(Ptg.java:84)
        at org.apache.poi.ss.formula.ptg.Ptg.readTokens(Ptg.java:55)
        at org.apache.poi.ss.formula.Formula.getTokens(Formula.java:82)
        at
org.apache.poi.hssf.record.FormulaRecord.getParsedExpression(FormulaRecord.java:314)
        at
org.apache.poi.hssf.record.aggregates.FormulaRecordAggregate.getFormulaTokens(FormulaRecordAggregate.java:201)
        at
org.apache.poi.hssf.usermodel.HSSFCell.getCellFormula(HSSFCell.java:649)
        at
org.apache.poi.hssf.extractor.ExcelExtractor.getText(ExcelExtractor.java:339)
        at
net.sourceforge.docfetcher.model.parse.MSExcelParser.renderText(MSExcelParser.java:57)


The issue is severe, since no other contents from the Excel file is available,
due to the exception. Docfetcher runs POI like this:

    POIFSFileSystem fs = new POIFSFileSystem(in);
    extractor = new ExcelExtractor(fs);
    extractor.setFormulasNotResults(true);
    return extractor.getText();

Running my testcase in the debugger, the exception seems to occur on the sheet
"Macro1" row 1 column 0 which contains this:
    =ALIGNMENT(2;FALSE;1;0;FALSE)
which appears to relate to this FormulaRecord._byteEncoding:
    [30, 2, 0, 29, 0, 30, 1, 0, 30, 0, 0, 29, 0, 66, 5, 43, -128]

Note that the document must be opened with "Macros disabled" in order to make
this contents visible in Excel. The particular sheet contains several other
"Excel 4.0 macro functions".

Expected behavior would be, that POI can ignore unknown
functions/formulas/macros, such that at least the rest of the document can be
indexed for search.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 60405] AIOOBE: -32725 when loading an Excel file that includes some Excel 4.0 macros

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60405

Dominik Stadler <do...@gmx.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO

--- Comment #1 from Dominik Stadler <do...@gmx.at> ---
I made some initial steps to fix this, but am not sure if we parse the
macro-functions correctly. 

Can you post the actual macros that are stored in the XLS file?

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 60405] AIOOBE: -32725 when loading an Excel file that includes some Excel 4.0 macros

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60405

Dominik Stadler <do...@gmx.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #3 from Dominik Stadler <do...@gmx.at> ---
SVN r1852277 adds initial support for the cetab list of functions from the
spec. Parsing formulas of this document works now

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 60405] AIOOBE: -32725 when loading an Excel file that includes some Excel 4.0 macros

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60405

Dominik Stadler <do...@gmx.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |NEW

--- Comment #2 from Dominik Stadler <do...@gmx.at> ---
This seems to go quite a bit deeper than a simple parse error, the spec
contains a separate list of functions called "cetab", which Apache POI does not
support at all. The AIOOB is caused because parsing does not parse the bit
"fCeFunc", which then ends up in the function-index and makes it out of bounds:


-----------
tab (15 bits): A structure that specifies the function to be called. If fCeFunc
is 1, then this field
specifies a Cetab value. If fCeFunc is 0, then this field specifies a Ftab
value.

C - fCeFunc (1 bit): A bit that specifies whether tab specifies a Cetab value
or a Ftab value.
-----------


So it will not only require to fix parsing FuncVarPtg by reading the fCeFunc
bit, but also implementing a second list of known function definitions,
potentially ending up in new required functions later.


BTW, I did not find any such exception in our large regression testing (
http://people.apache.org/~centic/poi_regression/reportsAll/ ), which indicates
that such files are likely very rare.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org