You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by DigitalPebble <ju...@digitalpebble.com> on 2007/11/07 15:36:20 UTC
nutch-user@lucene.apache.org
Hi Milan,
We have developed a Nutch plugin which could be used for that and uses our
text classification library. The plugin consists in a Nutch Indexer which
creates a special field for the documents and a searcher which allows you to
switch the filter on.
We have used it for classifying spam on forums but I am sure that this
should work on porn just as well. You can find more details on our Text
Classification API on http://www.digitalpebble.com/solutionsTC.html. The
Nutch plugin is just a wrapper for that library.
Best,
Julien
--
http://www.digitalpebble.com
Open Source Solutions for Text Engineering
-------- Original Message -------- Subject: SaveSearch or Adult
FilterDate: Wed,
07 Nov 2007 14:24:37 +0000From: Milan Krendzelak
<mk...@mtld.mobi>Reply-To:
nutch-user@lucene.apache.orgTo: nutch-user@lucene.apache.org
Hi,
does somebody have any idea how to implement save search in Nutch.
I think will be cool to use Bayesian technique to classify the web site
as adult (porno) and store flag in index. Of cause some other technique
could be used as: regex, black list etc etc...
Cheers,
Milan Krendzelak
Senior Software Developer
Re: SaveSearch or Adult
Posted by Milan Krendzelak <mk...@mtld.mobi>.
Hi Julien,
thanks a lot for the hint, already looking into it ;-).
Cheers,
Milan
On Wed, 2007-11-07 at 14:36 +0000, DigitalPebble wrote:
> Hi Milan,
>
> We have developed a Nutch plugin which could be used for that and uses our
> text classification library. The plugin consists in a Nutch Indexer which
> creates a special field for the documents and a searcher which allows you to
> switch the filter on.
> We have used it for classifying spam on forums but I am sure that this
> should work on porn just as well. You can find more details on our Text
> Classification API on http://www.digitalpebble.com/solutionsTC.html. The
> Nutch plugin is just a wrapper for that library.
>
> Best,
>
> Julien
>
-------- Original Message -------- Subject: SaveSearch or Adult
FilterDate: Wed,
07 Nov 2007 14:24:37 +0000From: Milan Krendzelak
<mk...@mtld.mobi>Reply-To:
nutch-user@lucene.apache.orgTo: nutch-user@lucene.apache.org
Hi,
does somebody have any idea how to implement save search in Nutch.
I think will be cool to use Bayesian technique to classify the web site
as adult (porno) and store flag in index. Of cause some other technique
could be used as: regex, black list etc etc...
Cheers,
Milan Krendzelak
Senior Software Developer