You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Hannu Väisänen <hv...@joyx.joensuu.fi> on 2009/07/02 07:32:43 UTC

How to tell Nutch that text files are text files?

I am using Nutch to index plain text and LaTeX files.

Nutch thinks that some of the files are of type
application/octet-stream.

I have put these lines to file parse-plugins.xml

       <mimeType name="application/octet-stream">
                  <plugin id="parse-text" />
       </mimeType>

Now Nutch parses and indexes the files but when I look the search
results on Firefox/tomcat6 Nutch says that they are of type
application/octet-stream and does not show them.

How do I tell Nutch that it should show files of type
application/octet-stream as if they were text files?