You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2021/01/21 17:27:32 UTC

[Bug 65096] New: Apache POI Excel XLSX Streaming XML not correctly reading multiple inline Strings

https://bz.apache.org/bugzilla/show_bug.cgi?id=65096

            Bug ID: 65096
           Summary: Apache POI Excel XLSX Streaming XML not correctly
                    reading multiple inline Strings
           Product: POI
           Version: unspecified
          Hardware: PC
            Status: NEW
          Severity: normal
          Priority: P2
         Component: SXSSF
          Assignee: dev@poi.apache.org
          Reporter: JackPGreen@Gmail.com
  Target Milestone: ---

Created attachment 37706
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=37706&action=edit
Example

This is raised off of an issue on Stackoverflow -
https://stackoverflow.com/q/65789807

I've got an XLSX Excel file with a single cell.

When loaded using POI's WorkbookFactory, it's read correctly as a single cell.

When read using POI's XSSFSheetXMLHandler, it's read as though it was two
separate cells.

When looking at the underlying sheet.xml, you'd expect to see a single item of
text per cell, but here it's in two blocks - one formatted using a different
font to the other.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 65096] Apache POI Excel XLSX Streaming XML not correctly reading multiple inline Strings

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=65096

--- Comment #7 from PJ Fanning <fa...@yahoo.com> ---
In short term, can you ask the owners of the proprietary software not to use
multiple <t> elements for a cell?

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 65096] Apache POI Excel XLSX Streaming XML not correctly reading multiple inline Strings

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=65096

--- Comment #3 from Jack <Ja...@Gmail.com> ---
Created attachment 37709
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=37709&action=edit
SampleApplication.java

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 65096] Apache POI Excel XLSX Streaming XML not correctly reading multiple inline Strings

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=65096

PJ Fanning <fa...@yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #11 from PJ Fanning <fa...@yahoo.com> ---
this fox was in 5.0.0 release and no issues reported (afaics)

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 65096] Apache POI Excel XLSX Streaming XML not correctly reading multiple inline Strings

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=65096

--- Comment #10 from PJ Fanning <fa...@yahoo.com> ---
I've tried a fix - r1885770 - so far, it looks like the streaming xlsx parser
code is somewhat undertested - so I hope I haven't broken other use cases when
trying to fix this case

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 65096] Apache POI Excel XLSX Streaming XML not correctly reading multiple inline Strings

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=65096

--- Comment #6 from Jack <Ja...@Gmail.com> ---
(In reply to PJ Fanning from comment #5)
> This is probably a bug but do you have any idea what produced this xlsx
> file? The sheet1.xml is formatted and the namespace declarations are
> different from most xlsx files I've seen. This is just out of interest.

It was produced by a piece of proprietary software, that's all I can disclose
unfortunately.
I extracted this segment from a larger document - I formatted the XML but the
namespace is as it was in the original file.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 65096] Apache POI Excel XLSX Streaming XML not correctly reading multiple inline Strings

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=65096

--- Comment #1 from Jack <Ja...@Gmail.com> ---
Created attachment 37707
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=37707&action=edit
Screenshot

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 65096] Apache POI Excel XLSX Streaming XML not correctly reading multiple inline Strings

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=65096

PJ Fanning <fa...@yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 OS|                            |All

--- Comment #5 from PJ Fanning <fa...@yahoo.com> ---
This is probably a bug but do you have any idea what produced this xlsx file?
The sheet1.xml is formatted and the namespace declarations are different from
most xlsx files I've seen. This is just out of interest.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 65096] Apache POI Excel XLSX Streaming XML not correctly reading multiple inline Strings

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=65096

--- Comment #2 from Jack <Ja...@Gmail.com> ---
Created attachment 37708
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=37708&action=edit
Sheet.xml

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 65096] Apache POI Excel XLSX Streaming XML not correctly reading multiple inline Strings

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=65096

--- Comment #9 from Jack <Ja...@Gmail.com> ---
(In reply to PJ Fanning from comment #7)
> In short term, can you ask the owners of the proprietary software not to use
> multiple <t> elements for a cell?

Unfortunately, as these files exist I need to be able to load them.

I worked around this by checking if the "is" tag is still open (accessed via
reflection) and storing the values before getting them when it's closed.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 65096] Apache POI Excel XLSX Streaming XML not correctly reading multiple inline Strings

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=65096

--- Comment #4 from Jack <Ja...@Gmail.com> ---
Created attachment 37710
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=37710&action=edit
Sample Output

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 65096] Apache POI Excel XLSX Streaming XML not correctly reading multiple inline Strings

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=65096

--- Comment #8 from PJ Fanning <fa...@yahoo.com> ---
The same bug exists in excel-streaming-reader - I have added a fix -
https://github.com/pjfanning/excel-streaming-reader/pull/29

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org