You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Vish Ramachandran <vi...@hp.com> on 2012/06/15 22:31:34 UTC
MSI file types being detected as application/x-tika-msoffice
Hi,
Download the following file, which is MSI installer for 7zip, a zip utility.
http://downloads.sourceforge.net/sevenzip/7z920-x64.msi
The following code:
String detectedType = new Tika().detect(new File("7z920-x64.msi"));
results in mime: application/x-tika-msoffice
which is wrong.
Is this expected, or am I missing something else?
Thanks
Vish
Re: MSI file types being detected as application/x-tika-msoffice
Posted by Alex Ott <al...@gmail.com>.
if you ping me on monday, I can try to find small example. Although, I
think that it can be generated using Window Installer toolkit
On Sat, Jun 16, 2012 at 5:37 PM, Nick Burch <ni...@alfresco.com> wrote:
> On 16/06/12 09:14, Alex Ott wrote:
>>
>> MSI file is windows installer, but internally it's using MS-CFB file
>> format to store data. To correctly detect it, detector should perform
>> transformation of object names (7z can do this, if I remember
>> correctly) into human-readable names, and then search for special
>> entries
>
>
> We can certainly update the detectors to handle this new (to us!) kind of
> OLE2 based file. Does anyone know of a very small .msi file we can use in a
> unit test for this? (An example one might be a good bet)
>
> Nick
--
With best wishes, Alex Ott
http://alexott.net/
Tiwtter: alexott_en (English), alexott (Russian)
Skype: alex.ott
Re: MSI file types being detected as application/x-tika-msoffice
Posted by Nick Burch <ni...@alfresco.com>.
On 16/06/12 09:14, Alex Ott wrote:
> MSI file is windows installer, but internally it's using MS-CFB file
> format to store data. To correctly detect it, detector should perform
> transformation of object names (7z can do this, if I remember
> correctly) into human-readable names, and then search for special
> entries
We can certainly update the detectors to handle this new (to us!) kind
of OLE2 based file. Does anyone know of a very small .msi file we can
use in a unit test for this? (An example one might be a good bet)
Nick
Re: MSI file types being detected as application/x-tika-msoffice
Posted by Alex Ott <al...@gmail.com>.
MSI file is windows installer, but internally it's using MS-CFB file
format to store data. To correctly detect it, detector should perform
transformation of object names (7z can do this, if I remember
correctly) into human-readable names, and then search for special
entries
On Fri, Jun 15, 2012 at 10:31 PM, Vish Ramachandran
<vi...@hp.com> wrote:
> Hi,
>
> Download the following file, which is MSI installer for 7zip, a zip utility.
>
> http://downloads.sourceforge.net/sevenzip/7z920-x64.msi
>
> The following code:
>
> String detectedType = new Tika().detect(new File("7z920-x64.msi"));
>
> results in mime: application/x-tika-msoffice
>
> which is wrong.
>
> Is this expected, or am I missing something else?
>
> Thanks
> Vish
>
>
>
>
>
>
>
--
With best wishes, Alex Ott
http://alexott.net/
Tiwtter: alexott_en (English), alexott (Russian)
Skype: alex.ott