You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@openoffice.apache.org by bu...@apache.org on 2023/01/06 11:05:03 UTC

[Issue 128551] New: XSLX import: missing all text

https://bz.apache.org/ooo/show_bug.cgi?id=128551

          Issue ID: 128551
        Issue Type: DEFECT
           Summary: XSLX import: missing all text
           Product: Calc
           Version: 4.2.0-dev
          Hardware: All
                OS: All
            Status: CONFIRMED
          Severity: Normal
          Priority: P5 (lowest)
         Component: open-import
          Assignee: issues@openoffice.apache.org
          Reporter: damjan@apache.org
  Target Milestone: ---

I have a bunch of huge XSLX spreadsheets (100000+ rows) from a 3rd party, and
they all show no text at all after opening. All the sheets are present, a
picture is present, some column background are present, but there is no text,
no numbers, no dates, nothing: only empty cells everywhere.

Excel and LibreOffice open them perfectly.

The fix for bug 126720 doesn't help here - it's a different issue.

I am not allowed to distribute these files, so I am using this bug to keep my
own notes while debugging OpenOffice.

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 128551] XSLX import: missing all text

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=128551

--- Comment #3 from damjan@apache.org ---
When SheetDataBuffer::setStringCell() is called on a working document, this is
the stack trace to it:

---snip---
#0  oox::xls::SheetDataBuffer::setStringCell(oox::xls::CellModel const&, int)
(this=0x80dcd1d20, rModel=..., nStringId=0) at
source/xls/sheetdatabuffer.cxx:373
#1  0x000000080eac9f43 in oox::xls::SheetDataContext::onEndElement()
(this=0x80dfa3980) at source/xls/sheetdatacontext.cxx:223
#2  0x000000080e711c63 in oox::core::ContextHandler2Helper::implEndElement(int)
(this=0x80dfa39c0, nElement=852948) at source/core/contexthandler2.cxx:120
#3  0x000000080e71220f in oox::core::ContextHandler2::endFastElement(int)
(this=0x80dfa3980, nElement=852948) at source/core/contexthandler2.cxx:209
#4  0x000000080ee3f4cc in
sax_fastparser::FastSaxParser::callbackEndElement(char const*)
(this=0x80dbd27c0) at source/fastparser/fastparser.cxx:849
#5  0x0000000807afed94 in  () at /usr/local/lib/libexpat.so.1
#6  0x0000000807afbd37 in  () at /usr/local/lib/libexpat.so.1
#7  0x0000000807afaaf9 in  () at /usr/local/lib/libexpat.so.1
#8  0x0000000807af7567 in  () at /usr/local/lib/libexpat.so.1
#9  0x0000000807af6d8b in XML_ParseBuffer () at /usr/local/lib/libexpat.so.1
#10 0x000000080ee3e39f in sax_fastparser::FastSaxParser::parse()
(this=this@entry=0x80dbd27c0) at source/fastparser/fastparser.cxx:646
---snip---

Frame #1, oox::xls::SheetDataContext::onEndElement(), is helpful.

---snip---
    167 void SheetDataContext::onEndElement()
    168 {
    169     if( getCurrentElement() == XLS_TOKEN( c ) )
    170     {
    171         // try to create a formula cell
    172         if( mbHasFormula ) switch( maFmlaData.mnFormulaType )
    173         {
...
    205         if( !mbHasFormula )
    206         {
    207             // no formula created: try to set the cell value
    208             if( maCellValue.getLength() > 0 ) switch(
maCellData.mnCellType )
    209             {
    210                 case XML_n:
    211                     mrSheetData.setValueCell( maCellData,
maCellValue.toDouble() );
    212                 break;
    213                 case XML_b:
    214                     mrSheetData.setBooleanCell( maCellData,
maCellValue.toDouble() != 0.0 );
    215                 break;
    216                 case XML_e:
    217                     mrSheetData.setErrorCell( maCellData, maCellValue
);
    218                 break;
    219                 case XML_str:
    220                     mrSheetData.setStringCell( maCellData, maCellValue
);
    221                 break;
    222                 case XML_s:
    223                     mrSheetData.setStringCell( maCellData,
maCellValue.toInt32() );
    224                 break;
    225             }
    226             else if( (maCellData.mnCellType == XML_inlineStr) &&
mxInlineStr.get() )
    227             {
    228                 mxInlineStr->finalizeImport();
    229                 mrSheetData.setStringCell( maCellData, mxInlineStr );
    230             }
    231             else
    232             {
    233                 // empty cell, update cell type
    234                 maCellData.mnCellType = XML_TOKEN_INVALID;
    235                 mrSheetData.setBlankCell( maCellData );
    236             }
    237         }
---snip---


Putting a breakpoint on line 172 with the condition "maCellValue.getLength() >
0" shows it is never triggered, and stepping through shows we always end in
lines 234-235, with the "empty cell, update cell type" comment.

So what on earth is going wrong with maCellValue?

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 128551] XSLX import: missing all text

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=128551

Keith N. McKenna <kn...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |knmc@apache.org
             Status|RESOLVED                    |CLOSED

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 128551] XSLX import: missing all text

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=128551

damjan@apache.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|CONFIRMED                   |RESOLVED
         Resolution|---                         |DUPLICATE

--- Comment #4 from damjan@apache.org ---
When I compared the data between a working file and a broken file, I
immediately saw the problem.

Good file:

---snip---
        <sheetData>
                <row r="1" ht="16" customHeight="true">
                        <c r="B1" s="2" t="s">
                                <v>0</v>
                        </c>
                        <c r="C1" s="2" t="e"/>
                        <c r="D1" s="2" t="e"/>
                        <c r="E1" s="2" t="e"/>
                </row>
---snip---

Bad file:

---snip---
  <sheetData>
    <row ht="87" customHeight="1">
      <c s="114"/>
      <c s="33"/>
      <c s="33"/>
      <c s="121" t="s">
        <v>53879</v>
      </c>
      <c r="G1" s="33"/>
      <c s="33"/>
      <c s="82"/>
    </row>
---snip---

The bad file has no r="..." attributes on its cells (<c> elements).

Manually adding them gets the cells to show.

This is a duplicate of bug 127672.

*** This issue has been marked as a duplicate of issue 127672 ***

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 128551] XSLX import: missing all text

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=128551

damjan@apache.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Latest|---                         |4.2.0-dev
    Confirmation in|                            |
           Keywords|                            |ms_interoperability

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 128551] XSLX import: missing all text

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=128551

--- Comment #2 from damjan@apache.org ---
Maybe sharedStrings.xml isn't accessible for some reason, like in bug 126720,
so strings can't be loaded.

Nope, debugging source/xls/workbookfragment.cxx:208 like I did in bug 126720
and going a bit further along oox::xls::WorkbookFragment::finalizeImport(),
then doing:

---snip---
(gdb) print getSharedStrings()
... std::vector of length 123253 ...
---snip---

so all the shared strings are loaded. I can't look at their contents though,
the RichString class doesn't make that easy.

But instead of looking at where data is coming in, let me look at where data is
going out into the spreadsheet.

Various methods on the SheetDataBuffer class look like they populate the
spreadsheet. A breakpoint on "SheetDataBuffer::setStringCell" never triggers
while loading this document, while it does trigger while loading other
documents that do show cell contents.

So the problem appears to be that some internal flow of data is broken, which
is stopping the parsed data from reaching the spreadsheet.

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 128551] XSLX import: missing all text

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=128551

--- Comment #1 from damjan@apache.org ---
Oh and Apache POI also opens them perfectly and is able to extract data just
fine.

-- 
You are receiving this mail because:
You are the assignee for the issue.