You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2021/11/25 10:41:05 UTC

[Bug 65706] New: Warning 'Entry [trash]/0000.dat is not valid' for valid trash item while opening .xlsx file

https://bz.apache.org/bugzilla/show_bug.cgi?id=65706

            Bug ID: 65706
           Summary: Warning 'Entry [trash]/0000.dat is not valid' for
                    valid trash item while opening .xlsx file
           Product: POI
           Version: unspecified
          Hardware: PC
                OS: Mac OS X 10.1
            Status: NEW
          Severity: normal
          Priority: P2
         Component: OPC
          Assignee: dev@poi.apache.org
          Reporter: bernhard.schuhmann@jedox.com
  Target Milestone: ---

Created attachment 38103
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=38103&action=edit
Excel file with trash item

POI 5.1.0 logs a warning while reading an .xlsx file which contains a trash
item, this is the log entry:

2021-11-25 10:00:21,801  WARN [pool-4-thread-1] (ZipPackage.java:340) - Entry
[trash]/0000.dat is not valid, so this part won't be add to the package.
org.apache.poi.openxml4j.exceptions.InvalidFormatException: Illegal character
in path at index 1: /[trash]/0000.dat
        at
org.apache.poi.openxml4j.opc.PackagingURIHelper.createPartName(PackagingURIHelper.java:500)
~[poi-ooxml-5.1.0.jar:5.1.0]
        at
org.apache.poi.openxml4j.opc.ZipPackage$EntryTriple.<init>(ZipPackage.java:337)
~[poi-ooxml-5.1.0.jar:5.1.0]
        at
org.apache.poi.openxml4j.opc.ZipPackage.lambda$getPartsImpl$0(ZipPackage.java:311)
~[poi-ooxml-5.1.0.jar:5.1.0]
        at
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
~[?:?]
        at
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
~[?:?]
        at
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
        at
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
~[?:?]
        at
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
~[?:?]
        at
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
        at
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) ~[?:?]
        at
org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:314)
~[poi-ooxml-5.1.0.jar:5.1.0]
        at
org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:740)
~[poi-ooxml-5.1.0.jar:5.1.0]
        at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:315)
~[poi-ooxml-5.1.0.jar:5.1.0]

According to the spec trash items are valid content of an ooxml file (see
https://www.ecma-international.org/publications-and-standards/standards/ecma-376/
part 1):

9.1.5 Trash Items

Trash items represent parts that have been discarded or are no longer in use.
Trash items shall not conform to OPC part naming guidelines as defined in
ECMA-376-2 and shall not be associated with a content type. All trash items
shall follow the naming scheme: [trash]/HHHH.dat where H represents a
hexadecimal digit.

[Example: A package has two parts that must be updated in-place but both parts
have grown beyond their growth hints. The newer updated parts are added as new
ZIP items while the original parts are renamed to:

  [trash]/0000.dat
  [trash]/0001.dat

end example]

In our case I guess SharePoint has added this trash item.

Since it's valid content of the ooxml file it should probably not get logged as
a warning. I know warnings are not errors, but still these warnings create
noise. We can filter those warnings though - especially since we haven't paid
attention to POI logging before you were moving to log4j2.

BTW, there's a typo in the log message - it should read '...so this part won't
be added...'.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 65706] Warning 'Entry [trash]/0000.dat is not valid' for valid trash item while opening .xlsx file

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=65706

--- Comment #1 from PJ Fanning <fa...@yahoo.com> ---
We are not going to fix every log statement. Can you try configuring log4j to
remove the logs you don't like? This is the whole point of logging frameworks.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 65706] Warning 'Entry [trash]/0000.dat is not valid' for valid trash item while opening .xlsx file

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=65706

--- Comment #6 from PJ Fanning <fa...@yahoo.com> ---
I've added r1895370 - code will skip trash parts - this is probably temporary
because someone will eventually want to use POI to access trash parts.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 65706] Warning 'Entry [trash]/0000.dat is not valid' for valid trash item while opening .xlsx file

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=65706

--- Comment #3 from Bernhard Schuhmann <be...@jedox.com> ---
Sure. But pls consider rethinking the log level as unnecessary warnings create
noise in the log. And a misconfigured filter in the logging framework could
swallow warnings that you might actually want to get.

Yes, meaningful logging is less important than getting it running right. But
it's still an indication for quality. I personally consider logging an integral
part of my software - it's the part ops is using when it's in production.

As said earlier - it is not a big issue for us as we have ignored POI logging
before, so we don't loose much if we filter those erroneous warnings out.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 65706] Warning 'Entry [trash]/0000.dat is not valid' for valid trash item while opening .xlsx file

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=65706

--- Comment #2 from PJ Fanning <fa...@yahoo.com> ---
added r1895322

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 65706] Warning 'Entry [trash]/0000.dat is not valid' for valid trash item while opening .xlsx file

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=65706

PJ Fanning <fa...@yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 65706] Warning 'Entry [trash]/0000.dat is not valid' for valid trash item while opening .xlsx file

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=65706

--- Comment #4 from Nick Burch <ap...@gagravarr.org> ---
Based on the bit of the ECMA spec, it does seem that trash files are allowed
(required?) to ignore all of the normal rules on what is a valid part name and
what isn't allowed as a part name

Not sure if we need to properly handle them, or just keep quiet about their
non-standard naming?

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 65706] Warning 'Entry [trash]/0000.dat is not valid' for valid trash item while opening .xlsx file

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=65706

--- Comment #5 from Bernhard Schuhmann <be...@jedox.com> ---
The discussion in [1] pointed me to the standard. I read it that these trash
items are considered valid content of an ooxml ZIP file and hence I'd assume
POI should not warn about it. I'd however be thankful for a log entry on level
DEBUG so if something is failing I could enable DEBUG logging and see the
warning about the trash item.

Not sure how the excel files we use for testing picked up all these trash
items. The attached file contains one, I'm now chasing several other regression
tests where excel files have more than one trash item. Haven't figured out who
is actually using them - but since for instance SharePoint is creating them
(see [1], we store our test files in SharePoint), it hopefully should also be
_using_ them - but then it's Microsoft...

[1]
https://social.technet.microsoft.com/Forums/en-US/287650b5-293c-48bc-90ec-9e13a61a46a6/office365-word-document-docx-banned-from-mailer-if-you-edit-properties-online-bug-?forum=sharepointgeneral

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org