You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Ralf R. Kotowski" <rr...@enlle.com> on 2013/11/02 18:08:58 UTC

RE: Language based outlink filtering

Has anyone done something like this andi s willing to share some sample
code?

Thnx

-----Original Message-----
From: Julien Nioche [mailto:lists.digitalpebble@gmail.com] 
Sent: Wednesday, October 02, 2013 1:00 PM
To: user@nutch.apache.org
Subject: Re: Language based outlink filtering

Hi,

You can do that by activating the language-identifier plugin then write a
custom ScoringFilter which will remove the outlinks in the
method distributeScoreToOutlinks()

HTH

Julien


On 30 September 2013 11:41, ilhami Kalkan <il...@agmlab.com> wrote:

> Hi,
> i want to extract outlinks from a webpage with a specific language. Any
> ideas about how can I do this?
> Thanks
>
>


-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble