You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2016/12/25 16:10:33 UTC

[Bug 60517] New: Incorrect parsing multiple sheet in formula

https://bz.apache.org/bugzilla/show_bug.cgi?id=60517

            Bug ID: 60517
           Summary: Incorrect parsing multiple sheet in formula
           Product: POI
           Version: unspecified
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: major
          Priority: P2
         Component: HSSF
          Assignee: dev@poi.apache.org
          Reporter: zhangchaowang@gmail.com
  Target Milestone: ---

Created attachment 34554
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=34554&action=edit
test file

Summary:

In my Excel research project from EUSES data base, I found that HSSF in POI
3.15 final release didn't parse multiple sheet formula correctly.

Details:

When the multiple sheet name quoted by single quote, HSSF put single quote
between every sheet name. (See at sheet: Summary in cell: D17 of formula:
SUM('1003':'1856'!D28))

For example,

Formula: = SUM('1003:1856'!D28)

Here the sheet name are 1003 and 1856. When HSSF parse string, it shows:

SUM('1003':'1856'!D28)

I should expect the formula string should be :

SUM('1003:1856'!D28)

If I convert XLS into XLSX file, the new XSSF parse it correctly as
SUM('1003:1856'!D28)

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 60517] Incorrect parsing multiple sheet in formula

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60517

Javen O'Neal <on...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 OS|Linux                       |All
           Hardware|PC                          |All

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 60517] Incorrect parsing multiple sheet in formula

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60517

Javen O'Neal <on...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO

--- Comment #1 from Javen O'Neal <on...@apache.org> ---
Could you include a code snippet of what you're doing, including any
manipulations you are doing to the stored formula string and how you're getting
the formula string.

FWIW, when I open your attached XLS file in LibreOffice, the following formula
appears in D17:
=SUM($'1003'.D28:$'1856'.D28)

When I use LibreOffice to save the file as XLSX, I get:
=SUM('1003':$'1856'.D28:D28)

And after converting to XLSX, the value saved in sheet1.xml is:
<c r="D17" s="19" t="n">
<f aca="false">
SUM(&apos;1003&apos;:&apos;1856&apos;!D28:D28)</f>
<v>
102320067</v>
</c>

It may be possible that both versions are correct.
For example, 'Sheet1'!A1 and Sheet1!A1 are both correct. In general, single
quotations are needed whenever a character in the sheet name could cause the
formula to be parsed differently (whitespace, punctuation, sheet name looks
like a cell reference). I would assume that '1003:1856' would not be correct as
that would imply that there is a sheet named "1003:1856".

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 60517] Incorrect parsing multiple sheet in formula

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60517

--- Comment #4 from Ricky <zh...@gmail.com> ---
As you found it out, single quote is very important in multiple sheet of
formula string,

1. The single quote of sheet name is kept if any characters such as
#"=<>&+-%,SINGLE_SPACE exist in sheet name. eg, if you put # sign inside sheet
name, ie 'Sheet#4', the single quote is preserved

2. The parsed formula string of multiple sheet from POI 3.15 is incorrect. At
any time, it is illegal to quote sheet name twice =SUM('1003':'1856'!D28). The
correct parse result should be SUM('1003:1856'!D28)

I tried to figure it out how POI generate parse result and patch it by myself.
But IMHO the formula parsing code is not quite readable. In the absence of
development document, it is better to leave maintainer to fix it.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 60517] Incorrect parsing multiple sheet in formula

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60517

--- Comment #5 from Javen O'Neal <on...@apache.org> ---
Created attachment 34588
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=34588&action=edit
Simplified test case workbook XLSX

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 60517] Incorrect parsing multiple sheet in formula

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60517

--- Comment #3 from Mark Murphy <jm...@apache.org> ---
(In reply to Javen O'Neal from comment #1)
> Could you include a code snippet of what you're doing, including any
> manipulations you are doing to the stored formula string and how you're
> getting the formula string.
> 
> FWIW, when I open your attached XLS file in LibreOffice, the following
> formula appears in D17:
> =SUM($'1003'.D28:$'1856'.D28)
> 
> When I use LibreOffice to save the file as XLSX, I get:
> =SUM('1003':$'1856'.D28:D28)
> 
> And after converting to XLSX, the value saved in sheet1.xml is:
> <c r="D17" s="19" t="n">
> <f aca="false">
> SUM(&apos;1003&apos;:&apos;1856&apos;!D28:D28)</f>
> <v>
> 102320067</v>
> </c>
> 
> It may be possible that both versions are correct.
> For example, 'Sheet1'!A1 and Sheet1!A1 are both correct. In general, single
> quotations are needed whenever a character in the sheet name could cause the
> formula to be parsed differently (whitespace, punctuation, sheet name looks
> like a cell reference). I would assume that '1003:1856' would not be correct
> as that would imply that there is a sheet named "1003:1856".

I did not know a range could look like this.

I did some testing in Excel 2016 (Windows), and it appears that
=SUM('1003:1856'!A1) is a valid range syntax where there is a tab named 1003
and another tab named 1856. All tabs physically between tabs 1003 and 1856 will
be included in the sum. So if I created a tab named Sheet4 and put a value in
cell A1, that cell is included in the sum only if I move tab Sheet4 to a
position between tabs 1003 and 1856. This formula is maintained, including
single quotes, if I save and reopen in XLS or XLSX format. If I add another tab
Sheet5, and then make the formula =SUM('Sheet4:Sheet5'!A1), Excel changes it to
=SUM(Sheet4:Sheet5!A1)

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 60517] Incorrect parsing multiple sheet in formula

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60517

Ricky <zh...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |NEW

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 60517] Incorrect parsing multiple sheet in formula

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60517

--- Comment #2 from Ricky <zh...@gmail.com> ---
It is incorrect in your interpretation. What I refer in my previous post is
multiple sheet reference, ie two sheet 1003 and 1856 of the same cells. What
you did is two different cells =SUM($'1003'.D28:$'1856'.D28)

I tried to type =SUM('1003':'1856'!D28)) in Microsoft Excel 2016 Mac, but it
pops up an error dialog.

I only read/parse Excel by POI no update/write in my test.

Here is part of incomplete code that is for demo purpose:

        private static HSSFWorkbook createHSSFWorkBook(String fileName) throws
IOException {
            FileInputStream fis = new FileInputStream(fileName);
            try {
                return new HSSFWorkbook(fis);
            } catch (Exception e) {
                logger.warn("Failed to parse Excel file: " + fileName, e);
                return null;
            } finally {
                fis.close();
            }
        }

        public static void generateJson(String sheetFilePath, String
jsonFilePath) throws IOException {

                // load external Excel file by POI
                Workbook wb = null;
                if (sheetFilePath.endsWith(".xlsx"))
                        wb = new XSSFWorkbook(sheetFilePath);
                else {
                        wb = XlParser.createHSSFWorkBook(sheetFilePath);
                        if (wb == null)
                                return;
                }

                // loop through each sheet
        for(int sheetIndex = 0; sheetIndex < wb.getNumberOfSheets();
sheetIndex++) {
                Sheet sheet = wb.getSheetAt(sheetIndex);
                // loop through each row
                for (Row row : sheet) {
                        // loop through each cell
                    for (Cell cell : row) {
                        // retrieve cell reference
                        CellReference cellRef = new
CellReference(row.getRowNum(), cell.getColumnIndex());
                        switch(cell.getCellTypeEnum()) {
                                case FORMULA:
                                        String cellRefString =
cellRef.formatAsString();
                                        String formulaString = null;
                                        try {
                                                formulaString =
cell.getCellFormula();
                                        } catch (Exception e) {
                                                logger.fatal("Failed to get
formula in cell " + cellRefString);
                                                throw e;
                                        }

                                break;
                                case BOOLEAN:

                                        break;
                                case ERROR:

                                        break;
                                case NUMERIC:

                                        break;
                                case STRING:

                                        break;
                                default:
                                        break;
                        }
                    }// end of cell
                }// end of row
        }//end of sheet
 }

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 60517] Incorrect parsing multiple sheet in formula

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60517

--- Comment #6 from Javen O'Neal <on...@apache.org> ---
Created attachment 34589
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=34589&action=edit
Simplified test case workbook XLS

It looks like Excel 2013 allows sheet ranges to be unquoted if the first and
last sheet do not contain special characters that would require quoting for a
single sheet cell reference.

Otherwise, Excel 2013 quotes the sheet range instead of quoting the sheets
individually.

Valid: Sheet1:Sheet2!A1
Valid: 'Sheet1:Sheet2'!A1
Valid: 'Sheet 1:Sheet 2'!A1
Invalid: 'Sheet1':'Sheet2'!A1
Invalid: 'Sheet1'!A1:'Sheet2'!A1
Invalid: Sheet 1:Sheet 2!A1

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org