You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Sjaiful Bahri <sb...@rocketmail.com> on 2008/10/29 03:51:31 UTC

Crawl News Site

Hi Folks,

FYI, I am running a crawl algorithm on zipclue.com

Zipclue is a prototype of News Search Engine (abbreviated NSE), which makes extensive use of the structure present in hypertext. Zipclue NSE is designed to crawl news information on the web effectively and efficiently. 

Zipclue stores and produces historical electronic news around the world depends upon limiting parameters which we as the authors can set up, in this particular instance, we rank them based upon news release time (NRT) which practically try to grab the latest news out there. The prototype with a full text and hyperlink database of electronic news is available here:
http://zipclue.com

Regards,
Sjaiful Bahri



      

Re: Crawl News Site

Posted by Alexander Aristov <al...@gmail.com>.
Just my personal impression. Although you have implemented very different
style of dysplaying results to distict your sire from other similar sites it
is not so usefuls as with traditional apporach. Summary is very short. I
cannot undestand what the artcile about. Very hard to match links with
corresponding entry.

I would improve  the page.

But anyway your attempt is very good.

Alex

2008/10/29 Sjaiful Bahri <sb...@rocketmail.com>

> Hi Folks,
>
> FYI, I am running a crawl algorithm on zipclue.com
>
> Zipclue is a prototype of News Search Engine (abbreviated NSE), which makes
> extensive use of the structure present in hypertext. Zipclue NSE is designed
> to crawl news information on the web effectively and efficiently.
>
> Zipclue stores and produces historical electronic news around the world
> depends upon limiting parameters which we as the authors can set up, in this
> particular instance, we rank them based upon news release time (NRT) which
> practically try to grab the latest news out there. The prototype with a full
> text and hyperlink database of electronic news is available here:
> http://zipclue.com
>
> Regards,
> Sjaiful Bahri
>
>
>
>
>


-- 
Best Regards
Alexander Aristov