You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@openoffice.apache.org by bu...@apache.org on 2023/01/06 11:05:03 UTC
[Issue 128551] New: XSLX import: missing all text
https://bz.apache.org/ooo/show_bug.cgi?id=128551
Issue ID: 128551
Issue Type: DEFECT
Summary: XSLX import: missing all text
Product: Calc
Version: 4.2.0-dev
Hardware: All
OS: All
Status: CONFIRMED
Severity: Normal
Priority: P5 (lowest)
Component: open-import
Assignee: issues@openoffice.apache.org
Reporter: damjan@apache.org
Target Milestone: ---
I have a bunch of huge XSLX spreadsheets (100000+ rows) from a 3rd party, and
they all show no text at all after opening. All the sheets are present, a
picture is present, some column background are present, but there is no text,
no numbers, no dates, nothing: only empty cells everywhere.
Excel and LibreOffice open them perfectly.
The fix for bug 126720 doesn't help here - it's a different issue.
I am not allowed to distribute these files, so I am using this bug to keep my
own notes while debugging OpenOffice.
--
You are receiving this mail because:
You are the assignee for the issue.
[Issue 128551] XSLX import: missing all text
Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=128551
--- Comment #3 from damjan@apache.org ---
When SheetDataBuffer::setStringCell() is called on a working document, this is
the stack trace to it:
---snip---
#0 oox::xls::SheetDataBuffer::setStringCell(oox::xls::CellModel const&, int)
(this=0x80dcd1d20, rModel=..., nStringId=0) at
source/xls/sheetdatabuffer.cxx:373
#1 0x000000080eac9f43 in oox::xls::SheetDataContext::onEndElement()
(this=0x80dfa3980) at source/xls/sheetdatacontext.cxx:223
#2 0x000000080e711c63 in oox::core::ContextHandler2Helper::implEndElement(int)
(this=0x80dfa39c0, nElement=852948) at source/core/contexthandler2.cxx:120
#3 0x000000080e71220f in oox::core::ContextHandler2::endFastElement(int)
(this=0x80dfa3980, nElement=852948) at source/core/contexthandler2.cxx:209
#4 0x000000080ee3f4cc in
sax_fastparser::FastSaxParser::callbackEndElement(char const*)
(this=0x80dbd27c0) at source/fastparser/fastparser.cxx:849
#5 0x0000000807afed94 in () at /usr/local/lib/libexpat.so.1
#6 0x0000000807afbd37 in () at /usr/local/lib/libexpat.so.1
#7 0x0000000807afaaf9 in () at /usr/local/lib/libexpat.so.1
#8 0x0000000807af7567 in () at /usr/local/lib/libexpat.so.1
#9 0x0000000807af6d8b in XML_ParseBuffer () at /usr/local/lib/libexpat.so.1
#10 0x000000080ee3e39f in sax_fastparser::FastSaxParser::parse()
(this=this@entry=0x80dbd27c0) at source/fastparser/fastparser.cxx:646
---snip---
Frame #1, oox::xls::SheetDataContext::onEndElement(), is helpful.
---snip---
167 void SheetDataContext::onEndElement()
168 {
169 if( getCurrentElement() == XLS_TOKEN( c ) )
170 {
171 // try to create a formula cell
172 if( mbHasFormula ) switch( maFmlaData.mnFormulaType )
173 {
...
205 if( !mbHasFormula )
206 {
207 // no formula created: try to set the cell value
208 if( maCellValue.getLength() > 0 ) switch(
maCellData.mnCellType )
209 {
210 case XML_n:
211 mrSheetData.setValueCell( maCellData,
maCellValue.toDouble() );
212 break;
213 case XML_b:
214 mrSheetData.setBooleanCell( maCellData,
maCellValue.toDouble() != 0.0 );
215 break;
216 case XML_e:
217 mrSheetData.setErrorCell( maCellData, maCellValue
);
218 break;
219 case XML_str:
220 mrSheetData.setStringCell( maCellData, maCellValue
);
221 break;
222 case XML_s:
223 mrSheetData.setStringCell( maCellData,
maCellValue.toInt32() );
224 break;
225 }
226 else if( (maCellData.mnCellType == XML_inlineStr) &&
mxInlineStr.get() )
227 {
228 mxInlineStr->finalizeImport();
229 mrSheetData.setStringCell( maCellData, mxInlineStr );
230 }
231 else
232 {
233 // empty cell, update cell type
234 maCellData.mnCellType = XML_TOKEN_INVALID;
235 mrSheetData.setBlankCell( maCellData );
236 }
237 }
---snip---
Putting a breakpoint on line 172 with the condition "maCellValue.getLength() >
0" shows it is never triggered, and stepping through shows we always end in
lines 234-235, with the "empty cell, update cell type" comment.
So what on earth is going wrong with maCellValue?
--
You are receiving this mail because:
You are the assignee for the issue.
[Issue 128551] XSLX import: missing all text
Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=128551
Keith N. McKenna <kn...@apache.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |knmc@apache.org
Status|RESOLVED |CLOSED
--
You are receiving this mail because:
You are the assignee for the issue.
[Issue 128551] XSLX import: missing all text
Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=128551
damjan@apache.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|CONFIRMED |RESOLVED
Resolution|--- |DUPLICATE
--- Comment #4 from damjan@apache.org ---
When I compared the data between a working file and a broken file, I
immediately saw the problem.
Good file:
---snip---
<sheetData>
<row r="1" ht="16" customHeight="true">
<c r="B1" s="2" t="s">
<v>0</v>
</c>
<c r="C1" s="2" t="e"/>
<c r="D1" s="2" t="e"/>
<c r="E1" s="2" t="e"/>
</row>
---snip---
Bad file:
---snip---
<sheetData>
<row ht="87" customHeight="1">
<c s="114"/>
<c s="33"/>
<c s="33"/>
<c s="121" t="s">
<v>53879</v>
</c>
<c r="G1" s="33"/>
<c s="33"/>
<c s="82"/>
</row>
---snip---
The bad file has no r="..." attributes on its cells (<c> elements).
Manually adding them gets the cells to show.
This is a duplicate of bug 127672.
*** This issue has been marked as a duplicate of issue 127672 ***
--
You are receiving this mail because:
You are the assignee for the issue.
[Issue 128551] XSLX import: missing all text
Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=128551
damjan@apache.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Latest|--- |4.2.0-dev
Confirmation in| |
Keywords| |ms_interoperability
--
You are receiving this mail because:
You are the assignee for the issue.
[Issue 128551] XSLX import: missing all text
Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=128551
--- Comment #2 from damjan@apache.org ---
Maybe sharedStrings.xml isn't accessible for some reason, like in bug 126720,
so strings can't be loaded.
Nope, debugging source/xls/workbookfragment.cxx:208 like I did in bug 126720
and going a bit further along oox::xls::WorkbookFragment::finalizeImport(),
then doing:
---snip---
(gdb) print getSharedStrings()
... std::vector of length 123253 ...
---snip---
so all the shared strings are loaded. I can't look at their contents though,
the RichString class doesn't make that easy.
But instead of looking at where data is coming in, let me look at where data is
going out into the spreadsheet.
Various methods on the SheetDataBuffer class look like they populate the
spreadsheet. A breakpoint on "SheetDataBuffer::setStringCell" never triggers
while loading this document, while it does trigger while loading other
documents that do show cell contents.
So the problem appears to be that some internal flow of data is broken, which
is stopping the parsed data from reaching the spreadsheet.
--
You are receiving this mail because:
You are the assignee for the issue.
[Issue 128551] XSLX import: missing all text
Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=128551
--- Comment #1 from damjan@apache.org ---
Oh and Apache POI also opens them perfectly and is able to extract data just
fine.
--
You are receiving this mail because:
You are the assignee for the issue.