You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2007/10/18 00:04:51 UTC

[jira] Updated: (TIKA-76) Need to add test documents with wrong extensions.

     [ https://issues.apache.org/jira/browse/TIKA-76?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting updated TIKA-76:
------------------------------

    Attachment: TIKA-76.patch

Instead of making real copies of the documents, we could always just feed an incorrect file name with the original resource stream.

See the attached patch for an example of how this could work with AutoDetectParserTest. The patch uses the AutoDetectParser on all the current test documents in the following configurations:

    * correct name and type hints
    * correct name but no type hint
    * correct name but incorrect type hint
    * incorrect type and no name hint
    * correct type but no name hint
    * correct type but incorrect name hint
    * incorrect name and no type hint
    * incorrect name and type hints
    * no name or type hints

It seems we currently need MIME magic tests for Excel, PowerPoint, RTF, plain text, word, and XML.

> Need to add test documents with wrong extensions.
> -------------------------------------------------
>
>                 Key: TIKA-76
>                 URL: https://issues.apache.org/jira/browse/TIKA-76
>             Project: Tika
>          Issue Type: Improvement
>          Components: general
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>             Fix For: 0.1-incubator
>
>         Attachments: TIKA-76.patch
>
>
> We need to add test documents with misleading extensions to verify that the file header MIME type determination is taking precedence over the file name approach.
> I suggest copying existing files such as:
> cp testHTML.html testReallyHTML.doc

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Updated: (TIKA-76) Need to add test documents with wrong extensions.

Posted by "Keith R. Bennett" <kb...@bbsinc.biz>.
Jukka -

That looks good for comprehensive testing, but how do you feel about
TIKA-75?  This would have multiple uses, including being consistent with our
goal of providing MIME detection services.

- Keith


JIRA jira@apache.org wrote:
> 
> 
>      [
> https://issues.apache.org/jira/browse/TIKA-76?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> ]
> 
> Jukka Zitting updated TIKA-76:
> ------------------------------
> 
>     Attachment: TIKA-76.patch
> 
> Instead of making real copies of the documents, we could always just feed
> an incorrect file name with the original resource stream.
> 
> See the attached patch for an example of how this could work with
> AutoDetectParserTest. The patch uses the AutoDetectParser on all the
> current test documents in the following configurations:
> 
>     * correct name and type hints
>     * correct name but no type hint
>     * correct name but incorrect type hint
>     * incorrect type and no name hint
>     * correct type but no name hint
>     * correct type but incorrect name hint
>     * incorrect name and no type hint
>     * incorrect name and type hints
>     * no name or type hints
> 
> It seems we currently need MIME magic tests for Excel, PowerPoint, RTF,
> plain text, word, and XML.
> 
>> Need to add test documents with wrong extensions.
>> -------------------------------------------------
>>
>>                 Key: TIKA-76
>>                 URL: https://issues.apache.org/jira/browse/TIKA-76
>>             Project: Tika
>>          Issue Type: Improvement
>>          Components: general
>>    Affects Versions: 0.1-incubator
>>            Reporter: Keith R. Bennett
>>             Fix For: 0.1-incubator
>>
>>         Attachments: TIKA-76.patch
>>
>>
>> We need to add test documents with misleading extensions to verify that
>> the file header MIME type determination is taking precedence over the
>> file name approach.
>> I suggest copying existing files such as:
>> cp testHTML.html testReallyHTML.doc
> 
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/-jira--Created%3A-%28TIKA-76%29-Need-to-add-test-documents-with-wrong-extensions.-tf4643422.html#a13264427
Sent from the Apache Tika - Development mailing list archive at Nabble.com.