You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Tucker B <ba...@gmail.com> on 2019/05/14 17:00:20 UTC

Configuring mime type detection for password protected OOMXL

I have a password protected xlsx file. The default mime type detection
returns a mime type of "application/x-tika-ooxml-protected". Is it
possible to configure the mime type detection to return the underlying
content type, e.g.
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet". I
didn't see any configuration options available to override in
custom-mimetypes.xml.

Re: Configuring mime type detection for password protected OOMXL

Posted by Tim Allison <ta...@apache.org>.
We have a unit test for an xlsx file with the default password, and
that shows that the content type is updated to
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"...in
short, that file works as I'd expect it to....might be nice to include
in the metadata that the file was initially encrypted, but that result
is good enough for me for now.

However, when I just now tried to open an xlsx file with an actual
password, I got an exception....this is messy...and probably a reason
to respin 1.21-rc1...ugh..

Fellow devs, see TIKA-2873....ugh...

On Tue, May 14, 2019 at 2:05 PM Tucker B <ba...@gmail.com> wrote:
>
> On Tue, 14 May 2019, 13:52 Tim Allison, <ta...@apache.org> wrote:
>>
>> Hi Tucker,
>>   I know only a little about this area, but I think password protected
>> xlsx files (and ooxml generally) are encrypted inside an OLE package
>> so you can't even get to the underlying ooxml/zip file until you've
>> decrypted the file.
>
>
> That is my understanding as well. And can confirm based on the OfficeParser code paths for x-tika-ooxml-protected.
>
>> Do you have the passwords to these files?
>
>
> In most cases they are the default password. So I might need to add a custom mimetype detector to add as a composite detector for handling the case where the default password will work.
>
>> On Tue, May 14, 2019 at 1:00 PM Tucker B <ba...@gmail.com> wrote:
>> >
>> > I have a password protected xlsx file. The default mime type detection
>> > returns a mime type of "application/x-tika-ooxml-protected". Is it
>> > possible to configure the mime type detection to return the underlying
>> > content type, e.g.
>> > "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet". I
>> > didn't see any configuration options available to override in
>> > custom-mimetypes.xml.

Re: Configuring mime type detection for password protected OOMXL

Posted by Tim Allison <ta...@apache.org>.
We have a unit test for an xlsx file with the default password, and
that shows that the content type is updated to
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"...in
short, that file works as I'd expect it to....might be nice to include
in the metadata that the file was initially encrypted, but that result
is good enough for me for now.

However, when I just now tried to open an xlsx file with an actual
password, I got an exception....this is messy...and probably a reason
to respin 1.21-rc1...ugh..

Fellow devs, see TIKA-2873....ugh...

On Tue, May 14, 2019 at 2:05 PM Tucker B <ba...@gmail.com> wrote:
>
> On Tue, 14 May 2019, 13:52 Tim Allison, <ta...@apache.org> wrote:
>>
>> Hi Tucker,
>>   I know only a little about this area, but I think password protected
>> xlsx files (and ooxml generally) are encrypted inside an OLE package
>> so you can't even get to the underlying ooxml/zip file until you've
>> decrypted the file.
>
>
> That is my understanding as well. And can confirm based on the OfficeParser code paths for x-tika-ooxml-protected.
>
>> Do you have the passwords to these files?
>
>
> In most cases they are the default password. So I might need to add a custom mimetype detector to add as a composite detector for handling the case where the default password will work.
>
>> On Tue, May 14, 2019 at 1:00 PM Tucker B <ba...@gmail.com> wrote:
>> >
>> > I have a password protected xlsx file. The default mime type detection
>> > returns a mime type of "application/x-tika-ooxml-protected". Is it
>> > possible to configure the mime type detection to return the underlying
>> > content type, e.g.
>> > "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet". I
>> > didn't see any configuration options available to override in
>> > custom-mimetypes.xml.

Re: Configuring mime type detection for password protected OOMXL

Posted by Tucker B <ba...@gmail.com>.
On Tue, 14 May 2019, 13:52 Tim Allison, <ta...@apache.org> wrote:

> Hi Tucker,
>   I know only a little about this area, but I think password protected
> xlsx files (and ooxml generally) are encrypted inside an OLE package
> so you can't even get to the underlying ooxml/zip file until you've
> decrypted the file.


That is my understanding as well. And can confirm based on the OfficeParser
code paths for x-tika-ooxml-protected.

Do you have the passwords to these files?
>

In most cases they are the default password. So I might need to add a
custom mimetype detector to add as a composite detector for handling the
case where the default password will work.

On Tue, May 14, 2019 at 1:00 PM Tucker B <ba...@gmail.com> wrote:
> >
> > I have a password protected xlsx file. The default mime type detection
> > returns a mime type of "application/x-tika-ooxml-protected". Is it
> > possible to configure the mime type detection to return the underlying
> > content type, e.g.
> > "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet". I
> > didn't see any configuration options available to override in
> > custom-mimetypes.xml.
>

Re: Configuring mime type detection for password protected OOMXL

Posted by Tim Allison <ta...@apache.org>.
Hi Tucker,
  I know only a little about this area, but I think password protected
xlsx files (and ooxml generally) are encrypted inside an OLE package
so you can't even get to the underlying ooxml/zip file until you've
decrypted the file.  Do you have the passwords to these files?


On Tue, May 14, 2019 at 1:00 PM Tucker B <ba...@gmail.com> wrote:
>
> I have a password protected xlsx file. The default mime type detection
> returns a mime type of "application/x-tika-ooxml-protected". Is it
> possible to configure the mime type detection to return the underlying
> content type, e.g.
> "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet". I
> didn't see any configuration options available to override in
> custom-mimetypes.xml.