You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2015/09/24 03:29:01 UTC

[Bug 58452] New: Copy cell formulas containing unregistered Ptgs

https://bz.apache.org/bugzilla/show_bug.cgi?id=58452

            Bug ID: 58452
           Summary: Copy cell formulas containing unregistered Ptgs
           Product: POI
           Version: unspecified
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: SS Common
          Assignee: dev@poi.apache.org
          Reporter: javenoneal@gmail.com

Carried over from poi-user mailing list
Original post:
http://mail-archives.apache.org/mod_mbox/poi-user/201509.mbox/%3CCAM%2BTppJHZRk4QfZ%3DJ8WF0zw5%2B%2BrHi-XOnXAXTNezHsgXtr2gPQ%40mail.gmail.com%3E
Nick's response:
http://mail-archives.apache.org/mod_mbox/poi-user/201509.mbox/%3Calpine.DEB.2.02.1509221216360.18373%40urchin.earth.li%3E

If a formula contains one or more unregistered UDF's, a FormulaParseException
is normally thrown.

I need a way to allow the formula to be parsed, using placeholder Ptgs for the
unregistered functions. With this array of Ptgs, I can manipulate registered
Ptgs as needed, then recreate the cell formula and write it to the cell.

Formulas containing unregistered functions will not be eligible for computation
in the calculation chain.

Solution will be targeted for XSSFWorkbooks, but it's possible a fix may also
be available for HSSFWorkbooks if the source cell that the formula is being
copied from was created by Excel (that is, Excel has validated the formula and
the Ptg is saved in the BIFF file).

>From Nick via poi-user mailing list
> [Add formula parsing option] to skip / unchange unknown function names? (It 
> might actually be easier to do the copy in the Ptg space, rather than 
> Formula text space, as with the ptgs you could check for non-absolute 
> references and update just them. That'd mean you'd need a ways to flag to 
> skip these functions / leave them unchanged or something)

I'll hack away at this to see what I can come up with.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58452] Copy cell formulas containing unregistered function names

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58452

Javen ONeal <ja...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #33172|0                           |1
        is obsolete|                            |

--- Comment #8 from Javen ONeal <ja...@gmail.com> ---
Created attachment 33173
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=33173&action=edit
main FormulaParser changes

Unit test should fail if evaluating the formula 'MYFUNC("B1")' does not raise a
org.apache.poi.ss.formula.eval.NotImplementedFunctionException/NotImplementedException.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58452] Copy cell formulas containing unregistered Ptgs

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58452

--- Comment #2 from Javen ONeal <ja...@gmail.com> ---
Created attachment 33167
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=33167&action=edit
supporting changes (java docs, add FormulaParsingWorkbook.createName)

Two approaches here how to handle when an "completely unknown name" is found in
the workbook:

1) Add the name to the workbook and use that to create the NamePtg that is
linked to the workbook.
> nameToken = _book.getNameXPtg(name, null);
> if (nameToken == null) {
>     // name is not an internal or external name
>     if (log.check(POILogger.WARN)) {
>         log.log(POILogger.WARN,
>                "Name '" + name + "' is completely unknown in the current workbook.");
>     }
>     // name is probably the name of an unregistered User-Defined Function
>     addName(name);
>     hName = _book.getName(name, _sheetIndex);
>     nameToken = hName.createPtg();
> }

2) Do not add the name to the workbook. Create a NamePtg that isn't linked to
the workbook.
https://svn.apache.org/viewvc/poi/trunk/src/java/org/apache/poi/ss/formula/FormulaParser.java?revision=1700670&view=markup#l943
> nameToken = _book.getNameXPtg(name, null);
> if (nameToken == null) {
>     if (log.check(POILogger.WARN)) {
>         log.log(POILogger.WARN,
>                 "Name '" + name + "' is completely unknown in the current workbook.");
>     }
>     switch(_book.getSpreadsheetVersion()) {
>         case EXCEL97:
>             nameToken = new NameXPtg(name);
>             break;
>         case EXCEL2007:
>             nameToken = new NameXPxg(name);
>             break;
>         default:
>             throw new IllegalStateException("Unrecognized SpreadsheetVersion" + _book.getSpreadsheetVersion());
>     }
> }

The second case seems like a cleaner solution since it doesn't introduce weird
side-effects (name gets added to workbook).
However, HSSFWorkbooks require all non-built-in-names (for named ranges and
UDFs, registered or not) to have an entry in the internal or external names
table. Thus, in order to save an HSSFWorkbook, we must add the name to the
workbook, meaning option 1 is the only feasible option here. This is because
the function name token is serialized as an index into the worksheet names
table, not a string of the function name.

Workbook.createName is currently not accessible in FormulaParser.parse. In the
attached patch, I add createName to the FormulaParsingWorkbook interface so
that these names can be created.
FormulaParsingWorkbook {
  public Name createName() {
    return _uBook.createName()
  }
}
Alternatively, we could just expose the _uBook Workbook via a getWorkbook()
call, and that'd save us from proxying so many functions to the underlying
Workbook. Let me know if you'd prefer that solution and I will upload the
patches with those changes.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58452] [PATCH] Copy cell formulas containing unregistered function names

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58452

Javen ONeal <ja...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Copy cell formulas          |[PATCH] Copy cell formulas
                   |containing unregistered     |containing unregistered
                   |function names              |function names
           Keywords|                            |PatchAvailable

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58452] Copy cell formulas containing unregistered function names

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58452

Javen ONeal <ja...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #33168|0                           |1
        is obsolete|                            |

--- Comment #7 from Javen ONeal <ja...@gmail.com> ---
Created attachment 33172
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=33172&action=edit
main FormulaParser changes

main changes (commit after supporting changes and testNames.xlsm)

Behavior:
* For EXCEL97 workbooks, a defined name is required, so add one to the workbook
* For EXCEL2007 workbooks, external names are saved as strings instead of in a
table and referenced by index, so create a NameXPxg without adding a defined
name to the workbook.

I checked the output to make sure Excel is able to open the workbooks without
complaining about corrupted contents. No issues in this build
(/test-data/spreadsheet/testNames-saved.xls and testNames-saved.xlsm)

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58452] [PATCH] Copy cell formulas containing unregistered function names

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58452

Javen ONeal <ja...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[PATCH] Copy cell formulas  |[PATCH] Copy cell formulas
                   |containing unregistered     |containing unregistered
                   |Ptgs                        |function names

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58452] [PATCH] Set cell formulas containing unregistered function names

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58452

Javen ONeal <ja...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED
            Summary|[PATCH] Copy cell formulas  |[PATCH] Set cell formulas
                   |containing unregistered     |containing unregistered
                   |function names              |function names

--- Comment #9 from Javen ONeal <ja...@gmail.com> ---
Applied supporting changes in r1711600, main changes in r1711605 to trunk

Site docs updated in r1711608.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58452] Copy cell formulas containing unregistered Ptgs

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58452

--- Comment #1 from Javen ONeal <ja...@gmail.com> ---
No code just yet. Here's what I have figured out so far (mostly a brain dump
for me, a passive request for help/direction from formula/ptg experts, and a
guide for future contributors):

Using testdata/spreadsheet/testNames.xls, UDF's (VBA macros) are saved in the
workbook names (Workbook.getName(String)), along with named ranges.

Converting this to an Excel2007 files, the UDFs are not included in the
workbook names.

This means that the behavior of FormulaParser.parse("MyFunc(\"arg\")", fpb,
FormulaParser.CELL, -1); is different for Excel97 and Excel2007 booms when
MyFunc is a UDF. Parse throws FormulaParseException if the formula contains a
UDF that isn't in the workbook names for both.

For my application, I want to copy formulas and shift the cell references, so I
will not be adding new formulas to the workbook. I'm not sure yet if I want to
implement a solution that could handle new formula names (since that would
create an invalid formula without adding the corresponding UDF to
vbaProject.bin or replacing vbaProject.bin with a vbaProject.bin from another
workbook.

Since XSSFWorkbook.getName() only includes named ranges, I'm assuming adding
UDFs to XSSFWotkbook._names is not desired. Right now I'm looking at creating a
variant of NamePtg that can hold information for XSSFWorkbook UDFs, so that
FormulaParser.parse returns ptgs that could be rendered back to a formula
string. Without registering a UDF with the same  name with the workbook via
XSSFWorkbook.addToolPack(UDFFinder), trying to evaluate a formula containing a
nkn-registered UDF will fail.

I could see where a user may rely on formula parse errors to detect
unregistered UDFs in a formula string, so I could introduce this fix by
requiring the user call Workbook.setAllowParsingUnregisteredUDFs(true), but I'm
thinking there are very few if any scenarios where someone would want to
disallow unregistered UDFs where they couldn't more directly detect it--such as
checking for UnregisteredNamePtg instances in the result of parse or otherwise.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58452] [PATCH] Copy cell formulas containing unregistered Ptgs

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58452

Javen ONeal <ja...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |PatchAvailable
            Summary|Copy cell formulas          |[PATCH] Copy cell formulas
                   |containing unregistered     |containing unregistered
                   |Ptgs                        |Ptgs

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58452] Copy cell formulas containing unregistered function names

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58452

Javen ONeal <ja...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|PatchAvailable              |
            Summary|[PATCH] Copy cell formulas  |Copy cell formulas
                   |containing unregistered     |containing unregistered
                   |function names              |function names

--- Comment #4 from Javen ONeal <ja...@gmail.com> ---
Current patches (attachment 33167 and attachment 33168) result in a corrupted
XLSM workbook.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58452] Copy cell formulas containing unregistered function names

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58452

Javen ONeal <ja...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #33167|0                           |1
        is obsolete|                            |

--- Comment #6 from Javen ONeal <ja...@gmail.com> ---
Created attachment 33171
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=33171&action=edit
supporting changes (java docs, add FormulaParsingWorkbook.createName)

Supporting changes, including java docs updates where documentation was lacking

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58452] Copy cell formulas containing unregistered Ptgs

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58452

--- Comment #3 from Javen ONeal <ja...@gmail.com> ---
Created attachment 33168
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=33168&action=edit
main FormulaParser changes

When parsing a formula, if there is an unknown name in the formula, add the
name to the workbook and continue parsing. This is consistent with how Excel
works: it allows formulas to contain "unrecognized text" or "invalid name
errors", but the formula result is #NAME?.

Formula evaluation is unchanged: a
org.apache.poi.ss.formula.eval.NotImplementedFunctionException/NotImplementedException
is thrown if a formula contains a UDF that isn't registered
(workbook.addToolPack).


In this patch:
* FormulaParser.parse no longer throws FormulaParseException for formulas
containing unregistered functions  such as '=myFunc("arg")'.
* modify TestExternalFunctions as necessary
* add FormulaParser unit tests to make sure formulas with invalid syntax still
throw FormulaParseException.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58452] Copy cell formulas containing unregistered function names

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58452

--- Comment #5 from Javen ONeal <ja...@gmail.com> ---
Created attachment 33170
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=33170&action=edit
testNames.xlsm - Test case spreadsheet

Add this attached workbook as /test-data/spreadsheet/testNames.xlsm
Set the appropriate mime-type: application/vnd.ms-excel.sheet.macroEnabled.12
or application/octet-stream

This workbook is based on /test-data/spreadsheet/testNames.xls [1] at r1614697.
I used MS Excel 2013 to convert this to Excel 2007 format, and removed personal
information.

[1]
http://svn.apache.org/viewvc/poi/trunk/test-data/spreadsheet/testNames.xls?view=log&pathrev=1614697

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org