You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Karine Storaker <ka...@gmail.com> on 2005/11/22 19:40:39 UTC

Questions about Nutch and enterprise search

Hi,

We are two master students at BI Norwegian School of Management currently
writing a termpaper on Nutch, and whether Nutch could enter the enterprise
search market. In this regard we were wondering if the Nutch technology
could easily be improved for this market. We were also wondering more
specifically if Nutch is able to work with different file formats, like for
instance Microsoft Office?

If you have any white papers or other documentation on Nutch that is not of
a to technical nature, we would love if you could send this to us!!

Hope to hear from you soon!!

Best regards,

Karine Storaker & Gerd Straume

Re: Questions about Nutch and enterprise search

Posted by Stefan Groschupf <sg...@media-style.com>.
Hi,
I would say nutch already is the process of entering the enterprice  
search market.
The Map reduce brunch has not scalibitlity limits anymore (in theory  
800 box on linux kernel 2.4 (see posting Doug)).
Just the search result quality and spam detection is still an issue  
from my point of view.

There is a plugin system that allows to extend nutch with several  
parser and I think there is already one for m$ word.
HTH
Stefan
Am 22.11.2005 um 19:40 schrieb Karine Storaker:

> Hi,
>
> We are two master students at BI Norwegian School of Management  
> currently
> writing a termpaper on Nutch, and whether Nutch could enter the  
> enterprise
> search market. In this regard we were wondering if the Nutch  
> technology
> could easily be improved for this market. We were also wondering more
> specifically if Nutch is able to work with different file formats,  
> like for
> instance Microsoft Office?
>
> If you have any white papers or other documentation on Nutch that  
> is not of
> a to technical nature, we would love if you could send this to us!!
>
> Hope to hear from you soon!!
>
> Best regards,
>
> Karine Storaker & Gerd Straume

---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net