You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Victor Kazakov (JIRA)" <ji...@apache.org> on 2010/09/16 00:44:33 UTC

[jira] Updated: (TIKA-516) Excel 5 files are inconsistently detected as either "application/msword" or "application/vnd.ms-excel"

     [ https://issues.apache.org/jira/browse/TIKA-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Victor Kazakov updated TIKA-516:
--------------------------------

    Attachment: excel5.xls

An excel 5 file

> Excel 5 files are inconsistently detected as either "application/msword" or "application/vnd.ms-excel"
> ------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-516
>                 URL: https://issues.apache.org/jira/browse/TIKA-516
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.7
>            Reporter: Victor Kazakov
>            Priority: Minor
>         Attachments: excel5.xls
>
>
> Using the AutoDetectParser on an Excel 5 file inconsistently detects it as either "application/msword" or "application/vnd.ms-excel"
> See the following code:
> 	public static void main(String[] args) throws Exception {
> 		FileInputStream stream = null;
> 		try {
> 			for (int i = 0; i < 10; i++) {
> 				File file = new File("excel5.xls");
> 				stream = new FileInputStream(file);
> 				AutoDetectParser parser = new AutoDetectParser();
> 				Metadata metadata = new Metadata();
> 				metadata.set(Metadata.RESOURCE_NAME_KEY, file.getName());
> 				parser.parse(stream, new DefaultHandler(), metadata);
> 				System.out.println(metadata.get(Metadata.CONTENT_TYPE));
> 			}
> 		} finally {
> 			if (stream != null) {
> 				stream.close();
> 			}
> 		}
> 	}
> an example output is: 
> application/vnd.ms-excel
> application/msword
> application/msword
> application/vnd.ms-excel
> application/vnd.ms-excel
> application/vnd.ms-excel
> application/vnd.ms-excel
> application/msword
> application/vnd.ms-excel
> application/msword
> The excel 5 file I used is attached to this bug.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.