You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@poi.apache.org by Greg Woolsey <gr...@gmail.com> on 2017/01/27 01:47:45 UTC

Evaluating expressions outside a cell value context

There are two main types of formula/reference expressions that are not
stored or evaluated as a cell value in Excel files.  Evaluating them in POI
currently requires a hack.

If that paragraph doesn't indicate a topic you care about, stop reading,
the rest is long and detailed :)

------------------------------------------------------

The types are:

1. Conditional Formatting rules (sometimes, depending on the rule, and then
either one or two, depending on the rule type)
2. Data Validation value ranges

I haven't encountered others, but I can imagine an application where
someone may want to evaluate an arbitrary expression also, without setting
it first as a cell formula.

This can be done today by accessing the evaluate* package-scoped methods in
o.a.p.ss.formula.WorkbookEvaluator from a class defined in that package.
This requires some understanding of formula parsing and the resulting Ptg
array, and uses a method explicitly noted to not be for public use.

A general-purpose public method would look something like the other
evaluate* methods:

public ValueEval evaluate(String formula, CellReference ref, FormulaType
formulaType) {
    final String sheetName = ref == null ? null : ref.getSheetName();
    if (sheetName == null) throw new IllegalArgumentException("Sheet name
is required");
    final int sheetIndex = getWorkbook().getSheetIndex(sheetName);
    final OperationEvaluationContext ec = new
OperationEvaluationContext(this, getWorkbook(), sheetIndex, ref.getRow(),
ref.getCol(), new EvaluationTracker(_cache));
    Ptg[] ptgs = FormulaParser.parse(formula, (FormulaParsingWorkbook)
getWorkbook(), formulaType, sheetIndex);
    return evaluateNameFormula(ptgs, ec);
}

and be fairly simple.

However, we may also want a more semantic way to handle evaluating data
validations and conditional formatting.  Using them has several
complexities and wrinkles that make it hard to do right in all cases, and
are monkey motion that would require the same boilerplate code every time
they are used.  These are prime candidates for additions to the POI API.
In that case, we have these questions we can address via a higher-level API:

1. What data validation rule applies to a given cell?  Null for none
2. What are the valid values allowed for a given cell? (evaluate the data
validation rule if any, or null if no rule applies)
-- this can hide the complexity inherent in the OOXML spec, which stores
both a statically defined list as CSV text and a cell range as a range
expression
3. What cells use a given data validation rule?
4. What conditional formatting rule applies to a given cell?
-- this would check the cell value against defined rules to match their
"applies to" range and evaluate matching ones in rule order (currently the
API returns them in storage order, which has no relationship to Excel
defined priority, which is an attribute).
5. What cells currently match a given conditional format rule?
-- this would check all cells in the rule's "applies to" range using the
logic from #4
-- this might be useful for systems that render Excel files, to apply the
rule format to all the matching cells at once.

I think it would be good to address all of these in POI itself, to
standardize the logic and have tests that validate against results verified
by hand in Excel.

This is definitely a candidate for a branch/discuss/refine/merge workflow,
which I'm willing to pursue.  I haven't done SVN branches and merges in
about 7 years, but how bad can it be? :D

One wrinkle is that I've seen the Vaadin code that does the conditional
formatting, because I'm a licensed customer, but it is using their
commercial license.  That means I can't do the POI work on the more complex
and complete steps above.

The general-purpose method above is based on code I wrote and submitted to
them, so it's fine.

I also have code I've written for data validation evaluation, so I can do
that as well.

I'm also willing to get my boss to contact Vaadin and see if they will
release the relevant bits of conditional formatting code to us for use in
POI with the Apache 2 license.  She's very good at that sort of thing.  I
wouldn't want it directly, but the patterns it follows likely mirror some
of what we need, and I don't want to cause a problem where there doesn't
need to be one.

Greg

Re: Evaluating expressions outside a cell value context

Posted by Javen O'Neal <on...@apache.org>.

Formulas, or at least cell references, show up in a few other places. Any
field that could either be a cell reference or a list of literal values,
such as specifying either A1:A2 or {1,1} for a data source can't be parsed
by just CellReference.
...
3. Named Ranges
4. Pivot Table data source
5. Database data source, possibly
6. Charts

> That means I can't do the POI work on the more complex
and complete steps above.

I'd be interested in writing some of the formula handling code if you need
a fresh set of eyes.

On Jan 26, 2017 5:48 PM, "Greg Woolsey" <gr...@gmail.com> wrote:

There are two main types of formula/reference expressions that are not
stored or evaluated as a cell value in Excel files.  Evaluating them in POI
currently requires a hack.

If that paragraph doesn't indicate a topic you care about, stop reading,
the rest is long and detailed :)

------------------------------------------------------

The types are:

1. Conditional Formatting rules (sometimes, depending on the rule, and then
either one or two, depending on the rule type)
2. Data Validation value ranges

I haven't encountered others, but I can imagine an application where
someone may want to evaluate an arbitrary expression also, without setting
it first as a cell formula.

This can be done today by accessing the evaluate* package-scoped methods in
o.a.p.ss.formula.WorkbookEvaluator from a class defined in that package.
This requires some understanding of formula parsing and the resulting Ptg
array, and uses a method explicitly noted to not be for public use.

A general-purpose public method would look something like the other
evaluate* methods:

public ValueEval evaluate(String formula, CellReference ref, FormulaType
formulaType) {
    final String sheetName = ref == null ? null : ref.getSheetName();
    if (sheetName == null) throw new IllegalArgumentException("Sheet name
is required");
    final int sheetIndex = getWorkbook().getSheetIndex(sheetName);
    final OperationEvaluationContext ec = new
OperationEvaluationContext(this, getWorkbook(), sheetIndex, ref.getRow(),
ref.getCol(), new EvaluationTracker(_cache));
    Ptg[] ptgs = FormulaParser.parse(formula, (FormulaParsingWorkbook)
getWorkbook(), formulaType, sheetIndex);
    return evaluateNameFormula(ptgs, ec);
}

and be fairly simple.

However, we may also want a more semantic way to handle evaluating data
validations and conditional formatting.  Using them has several
complexities and wrinkles that make it hard to do right in all cases, and
are monkey motion that would require the same boilerplate code every time
they are used.  These are prime candidates for additions to the POI API.
In that case, we have these questions we can address via a higher-level API:

1. What data validation rule applies to a given cell?  Null for none
2. What are the valid values allowed for a given cell? (evaluate the data
validation rule if any, or null if no rule applies)
-- this can hide the complexity inherent in the OOXML spec, which stores
both a statically defined list as CSV text and a cell range as a range
expression
3. What cells use a given data validation rule?
4. What conditional formatting rule applies to a given cell?
-- this would check the cell value against defined rules to match their
"applies to" range and evaluate matching ones in rule order (currently the
API returns them in storage order, which has no relationship to Excel
defined priority, which is an attribute).
5. What cells currently match a given conditional format rule?
-- this would check all cells in the rule's "applies to" range using the
logic from #4
-- this might be useful for systems that render Excel files, to apply the
rule format to all the matching cells at once.

I think it would be good to address all of these in POI itself, to
standardize the logic and have tests that validate against results verified
by hand in Excel.

This is definitely a candidate for a branch/discuss/refine/merge workflow,
which I'm willing to pursue.  I haven't done SVN branches and merges in
about 7 years, but how bad can it be? :D

One wrinkle is that I've seen the Vaadin code that does the conditional
formatting, because I'm a licensed customer, but it is using their
commercial license.  That means I can't do the POI work on the more complex
and complete steps above.

The general-purpose method above is based on code I wrote and submitted to
them, so it's fine.

I also have code I've written for data validation evaluation, so I can do
that as well.

I'm also willing to get my boss to contact Vaadin and see if they will
release the relevant bits of conditional formatting code to us for use in
POI with the Apache 2 license.  She's very good at that sort of thing.  I
wouldn't want it directly, but the patterns it follows likely mirror some
of what we need, and I don't want to cause a problem where there doesn't
need to be one.

Greg