You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "John Xing (JIRA)" <ji...@apache.org> on 2005/04/05 20:43:03 UTC
[jira] Commented: (NUTCH-33) MIME content type detector (using magic char sequences)
[ http://issues.apache.org/jira/browse/NUTCH-33?page=comments#action_62195 ]
John Xing commented on NUTCH-33:
--------------------------------
Just skimmed the code. The xml approach looks good.
Two minor comments:
(1) make magic check an option with a boolean property
such as mime.type.magic (true/false) in nutch-default.xml
(2) use org.apache.nutch.util.mime
I think there are codes in Hari Kodungallu's tarball that
cover primary/sub types.
Thanks,
John
> MIME content type detector (using magic char sequences)
> -------------------------------------------------------
>
> Key: NUTCH-33
> URL: http://issues.apache.org/jira/browse/NUTCH-33
> Project: Nutch
> Type: New Feature
> Reporter: Jerome Charron
> Priority: Minor
> Attachments: NUTCH-33.patch, mime-types.tar.gz
>
> Extension based content-type detector is not suffisant in some cases.
> The solution is to add a content type detector based on some magic char sequences like in apache httpd for instance.
> (Note: I created this issue only to keep a trace, but I'm currently working on it)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
http://www.atlassian.com/software/jira