You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Dima Mazmanov <nu...@proservice.ge> on 2006/04/18 11:04:18 UTC
Nutch shows same results multiple times.
Hi all!!
I'm running on nutch-0.7.1.
Here is result of my search.
ArGo Software Design Homepage
[html] - 30.2 k -
... Look of our Web Site Our web site has new look and ... link on the ...
http://www.argosoft.org/RootPages/Default.aspx (Cached)
ArGo Software Design Homepage
[html] - 30.2 k -
... Look of our Web Site Our web site has new look and ... link on the ...
http://www.argosoft.com/rootpages/Default.aspx (Cached)
ArGo Software Design Homepage
[html] - 30.2 k -
... Look of our Web Site Our web site has new look and ... link on the ...
http://www.argosoft.com/RootPages/Default.aspx (Cached)
ArGo Software Design Homepage
[html] - 30.2 k -
... Look of our Web Site Our web site has new look and ... link on the ...
http://www.argosoft.org/rootpages/Default.aspx (Cached)
As you can see one result is shown multiple times.
Why so?
What is the difference between these links? I don't see any..
So, how can I avoid this problem?
Thanks,
Regards, Dima
Re: Nutch shows same results multiple times.
Posted by Dima Mazmanov <nu...@proservice.ge>.
Well my script already contains this command....
> Run bin/nutch dedup segments dedup.tmp
>
>
> Dima Mazmanov wrote:
>> Hi all!! I'm running on nutch-0.7.1.
>>
>> Here is result of my search.
>>
>> ArGo Software Design Homepage [html] - 30.2 k - ... Look of our Web
>> Site Our web site has new look and ... link on the ...
>> http://www.argosoft.org/RootPages/Default.aspx (Cached)
>> ArGo Software Design Homepage [html] - 30.2 k - ... Look of our Web
>> Site Our web site has new look and ... link on the ...
>> http://www.argosoft.com/rootpages/Default.aspx (Cached)
>> ArGo Software Design Homepage [html] - 30.2 k - ... Look of our Web
>> Site Our web site has new look and ... link on the ...
>> http://www.argosoft.com/RootPages/Default.aspx (Cached)
>> ArGo Software Design Homepage [html] - 30.2 k - ... Look of our Web
>> Site Our web site has new look and ... link on the ...
>> http://www.argosoft.org/rootpages/Default.aspx (Cached)
>> As you can see one result is shown multiple times.
>> Why so? What is the difference between these links? I don't see any..
>> So, how can I avoid this problem?
>> Thanks, Regards, Dima
>>
>>
Re: Nutch shows same results multiple times.
Posted by "Håvard W. Kongsgård" <h....@niap.no>.
Run bin/nutch dedup segments dedup.tmp
Dima Mazmanov wrote:
> Hi all!! I'm running on nutch-0.7.1.
>
> Here is result of my search.
>
> ArGo Software Design Homepage [html] - 30.2 k - ... Look of our Web
> Site Our web site has new look and ... link on the ...
> http://www.argosoft.org/RootPages/Default.aspx (Cached)
> ArGo Software Design Homepage [html] - 30.2 k - ... Look of our Web
> Site Our web site has new look and ... link on the ...
> http://www.argosoft.com/rootpages/Default.aspx (Cached)
> ArGo Software Design Homepage [html] - 30.2 k - ... Look of our Web
> Site Our web site has new look and ... link on the ...
> http://www.argosoft.com/RootPages/Default.aspx (Cached)
> ArGo Software Design Homepage [html] - 30.2 k - ... Look of our Web
> Site Our web site has new look and ... link on the ...
> http://www.argosoft.org/rootpages/Default.aspx (Cached)
> As you can see one result is shown multiple times.
> Why so? What is the difference between these links? I don't see any..
> So, how can I avoid this problem?
> Thanks, Regards, Dima
>
>