You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2020/05/27 09:55:39 UTC

[Bug 64473] New: OPCPackage.open(fileName, PackageAccess.READ) does not open valid xlsx file

https://bz.apache.org/bugzilla/show_bug.cgi?id=64473

            Bug ID: 64473
           Summary: OPCPackage.open(fileName, PackageAccess.READ) does not
                    open valid xlsx file
           Product: POI
           Version: 4.1.2-FINAL
          Hardware: PC
            Status: NEW
          Severity: blocker
          Priority: P2
         Component: OPC
          Assignee: dev@poi.apache.org
          Reporter: berek@bk.ru
  Target Milestone: ---

Created attachment 37268
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=37268&action=edit
corrupted file

Contents of the unpacked xlsx file and apache poi

The initial problem is the inability to open the xlsx file through poi
(OPCPackage.open(fileName, PackageAccess.READ)), while in Excel it opens.

A detailed study of the poi showed that the problem lies in the contents of the
xlsx file.
If you unzip xslx file, then in the xl folder, in addition to all other files
there will be two due to which there is a problem

xl/metadata
xl/metadata.xml

when using poi method OPCPackage.open(fileName, PackageAccess.READ) this leads
to an error:

org.apache.poi.openxml4j.exceptions.InvalidFormatException: You can't add a
part with a part name derived from another part ! [M1.11]

which occurs due to the same file names in PackagePartCollection.put method.

If I just copy the contents of the entire xlsx file to a new created xlsx file
and save it, then the xl/metadata file will not be there and it will open
through poi well.
But I don’t have the task of just fixing the file, I need to figure out why
this problem could arise.

it looks like a slightly incorrect xlsx, but I can still open it through exel,
is there any way to open it through poi?
Is there any idea about the occurrence of xl/metadata in the contents of the
xlsx?

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 64473] [PATCH] OPCPackage.open(fileName, PackageAccess.READ) does not open valid xlsx file

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64473

Yury <yu...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 64473] OPCPackage.open(fileName, PackageAccess.READ) does not open valid xlsx file

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64473

Eugene <be...@bk.ru> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 OS|                            |All

--- Comment #1 from Eugene <be...@bk.ru> ---
Also pay attention to the documentation. I found only the draft version, but I
think that the difference there is not big.

https://www.ecma-international.org/activities/Office%20Open%20XML%20Formats/Draft%20ECMA-376%203rd%20edition,%20March%202011/Office%20Open%20XML%20Part%202%20-%20Open%20Packaging%20Conventions.pdf

item 9.1.1.4 Part Naming 


A package implementer shall neither create nor recognize a part with apart name
derived from another part name by appending segments to it. [M1.11][Example:If
a package contains a part named“/segment1/segment2/.../segmentn”, then other
parts in that packageshall not have names such as: “/segment1”,
“segment1/segment2”, or “/segment1/segment2/.../segmentn-1”. endexample]

But also look at the item:

9.1.1 Part Names
Each part has a name. Part namesrefer to parts within a package. [Example:The
part name “/hello/world/doc.xml” contains three segments: “hello”, “world”, and
“doc.xml”.The first two segments in the sample represent levelsin the logical
hierarchy and serve to organize the parts of the package, whereas the 
ECMA-376 Part214third contains actual content.Note that segments are not
explicitly representedas foldersin the package model, and no directory of
folders exists in the package model.end example]

In this example, “doc.xml” the name of this file is considered along with the
extension, whereas in the POI in the class PackagePartCollection in method
PackagePart put (final PackagePartName partName, final PackagePart part)

Comparison is made only by file names, not considering their extension, which
is possibly a mistake.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 64473] [PATCH] OPCPackage.open(fileName, PackageAccess.READ) does not open valid xlsx file

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64473

--- Comment #9 from PJ Fanning <fa...@yahoo.com> ---
Hi Simone - we need a reproducible test case to debug this or you can try
debugging yourself. Can you open a new issue? We fixed Yury's problem with this
issue - so it is best to track any similar issues with a new bugzilla issue.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 64473] OPCPackage.open(fileName, PackageAccess.READ) does not open valid xlsx file

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64473

--- Comment #6 from Yury <yu...@gmail.com> ---
Created attachment 37964
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=37964&action=edit
[PATCH] for fixing the issue

created by the following command:
ant -f patch.xml

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 64473] [PATCH] OPCPackage.open(fileName, PackageAccess.READ) does not open valid xlsx file

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64473

--- Comment #8 from Simone D'Avico <si...@gmail.com> ---
I see the same error occur with poi 5.1.0 and poi-ooxml 5.1.0. The xlsx file I
am trying to open indeed contains both metadata and metadata.xml.

Is there any way I can help troubleshoot this?

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 64473] [PATCH] OPCPackage.open(fileName, PackageAccess.READ) does not open valid xlsx file

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64473

--- Comment #10 from PJ Fanning <fa...@yahoo.com> ---
Giving us a file that reproduces the issue would be the main step towards
debugging the problem.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 64473] OPCPackage.open(fileName, PackageAccess.READ) does not open valid xlsx file

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64473

--- Comment #2 from PJ Fanning <fa...@yahoo.com> ---
It's possible we'll change POI code but the next release could be weeks away.

It's worth investigating where your xlsx file came from to find out why its
contents are not standard.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 64473] OPCPackage.open(fileName, PackageAccess.READ) does not open valid xlsx file

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64473

Nail Samatov <sa...@yandex.ru> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sanail@yandex.ru

--- Comment #3 from Nail Samatov <sa...@yandex.ru> ---
Created attachment 37929
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=37929&action=edit
Zip file with files to reproduce the bug

We also have the same issue.
I tried to find the steps on how we can create such files that apache poi can't
read.

Pre-requisites:
Excel from MS Office 365
files 1.xlsx and 2.xlsx (you can find them in the attached zip file).
1.xlsx contains "xl/metadata" and 2.xlsx contains "xl/metadata.xml"

Steps:
1. Open 1.xlsx in Excel
2. Open 2.xlsx in Excel
3. Right click on the worksheet tab and select Move or Copy.
4. Select the 1.xlsx option at the To Book drop-down list.
5. Press OK.
6. Save 1.xlsx.

After save you will have 1.xlsx which contains both xl/metadata and
xl/metadata.xml

You can find result of the steps above in the folder "result-of-merge" in the
same attached zip file. This file can't be read by POI but can be opened in
Excel.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 64473] [PATCH] OPCPackage.open(fileName, PackageAccess.READ) does not open valid xlsx file

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64473

--- Comment #7 from PJ Fanning <fa...@yahoo.com> ---
Thanks Yury - merged with r1891692

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 64473] OPCPackage.open(fileName, PackageAccess.READ) does not open valid xlsx file

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64473

--- Comment #4 from yurkom <yu...@gmail.com> ---
The issue appeared after https://bz.apache.org/bugzilla/show_bug.cgi?id=61942
ticket in revision 1819708.

I think the dot symbol in the regexp is unnecessary in the line :

"(?=["+PackagingURIHelper.FORWARD_SLASH_STRING+".])";
                                                ^
                                                this

See
https://svn.apache.org/viewvc/poi/trunk/poi-ooxml/src/main/java/org/apache/poi/openxml4j/opc/PackagePartCollection.java?revision=1819708&view=markup#l64

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 64473] [PATCH] OPCPackage.open(fileName, PackageAccess.READ) does not open valid xlsx file

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64473

Yury <yu...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|OPCPackage.open(fileName,   |[PATCH]
                   |PackageAccess.READ) does    |OPCPackage.open(fileName,
                   |not open valid xlsx file    |PackageAccess.READ) does
                   |                            |not open valid xlsx file

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 64473] OPCPackage.open(fileName, PackageAccess.READ) does not open valid xlsx file

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64473

Yury <yu...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Depends on|                            |61942

--- Comment #5 from Yury <yu...@gmail.com> ---
The issue appeared after https://bz.apache.org/bugzilla/show_bug.cgi?id=61942
ticket in revision 1819708.

I think the dot symbol in the regexp is unnecessary in the line :

"(?=["+PackagingURIHelper.FORWARD_SLASH_STRING+".])";
                                                ^
                                                this

See
https://svn.apache.org/viewvc/poi/trunk/poi-ooxml/src/main/java/org/apache/poi/openxml4j/opc/PackagePartCollection.java?revision=1819708&view=markup#l64


Referenced Bugs:

https://bz.apache.org/bugzilla/show_bug.cgi?id=61942
[Bug 61942] Refactor PackagePartName handling and add getUnusedPartIndex method
-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org