You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Sjaiful Bahri <sb...@rocketmail.com> on 2009/01/27 10:53:44 UTC

Crawl News Web

FYI,
Zipclue is designed to crawl news information on the
web effectively and efficiently.

http://zipclue.com



Cheers 
iful at http://zipclue.com


      

Re: Crawl News Web

Posted by W <wi...@gmail.com>.
I don't know .., ask the creator of zipclue, sjaiful bachri.

On Thu, Feb 12, 2009 at 12:39 PM, Saurabh Bhutyani <sa...@in.com> wrote:
>  Hi Wildn, I don't find the recent news of last 23 days when I do a search on zipclue. What is the crawl frequency? Also are you storing and displaying the results from db? The search is quite slow. Original message From:Sjaiful Bahri< sbahri@rocketmail.com >Date: 29 Jan 09 10:51:36Subject:Re: Crawl News WebTo: nutchuser@lucene.apache.orgHello Wildan,This is the process to crawl



-- 
---
OpenThink Labs
www.tobethink.com

Aligning IT and Education

>> 021-99325243
Y! : hawking_123
Linkedln : http://www.linkedin.com/in/wildanmaulana

Re: Crawl News Web

Posted by Saurabh Bhutyani <sa...@in.com>.
 Hi Wildn, I don't find the recent news of last 23 days when I do a search on zipclue. What is the crawl frequency? Also are you storing and displaying the results from db? The search is quite slow. Original message From:Sjaiful Bahri< sbahri@rocketmail.com >Date: 29 Jan 09 10:51:36Subject:Re: Crawl News WebTo: nutchuser@lucene.apache.orgHello Wildan,This is the process to crawl news site:1. Deep First search to identify news site2. Crawl process using regular expression3. Save result contents into database4. Users ready to find News through database Wwrote:> Can you share the architecture do you use ? are you > using nutch also > for the backend ? >>> Regards, > Wildan >> On Tue, Jan 27, 2009 at 4:53 PM, Sjaiful Bahri >wrote: > > FYI, > > Zipclue is designed to crawl news information on > the > > web effectively and efficiently. > > > > http://zipclue.com > > > > > > > > Cheers > > iful at http://zipclue.com > > > > > > > > >>>> >  > tobeThink! > www.tobethink.com >> Alignin
 g IT and Education >> >> 02199325243 > Y! : hawking123 > Linkedln : http://www.linkedin.com/in/wildanmaulana >Cheersiful at http://zipclue.com

Re: Crawl News Web

Posted by Sjaiful Bahri <sb...@rocketmail.com>.
Hello Wildan,

This is the process to crawl news site:

1. Deep First search to identify news site 
2. Crawl process using regular expression 
3. Save result contents into database 
4. Users ready to find News through database

--- W <wi...@gmail.com> wrote:

> Can you share the architecture do you use ? are you
> using nutch also
> for the backend ?
> 
> 
> Regards,
> Wildan
> 
> On Tue, Jan 27, 2009 at 4:53 PM, Sjaiful Bahri
> <sb...@rocketmail.com> wrote:
> > FYI,
> > Zipclue is designed to crawl news information on
> the
> > web effectively and efficiently.
> >
> > http://zipclue.com
> >
> >
> >
> > Cheers
> > iful at http://zipclue.com
> >
> >
> >
> >
> 
> 
> 
> -- 
> ---
> tobeThink!
> www.tobethink.com
> 
> Aligning IT and Education
> 
> >> 021-99325243
> Y! : hawking_123
> Linkedln : http://www.linkedin.com/in/wildanmaulana
> 


Cheers 
iful at http://zipclue.com


      

Re: Crawl News Web

Posted by W <wi...@gmail.com>.
Can you share the architecture do you use ? are you using nutch also
for the backend ?


Regards,
Wildan

On Tue, Jan 27, 2009 at 4:53 PM, Sjaiful Bahri <sb...@rocketmail.com> wrote:
> FYI,
> Zipclue is designed to crawl news information on the
> web effectively and efficiently.
>
> http://zipclue.com
>
>
>
> Cheers
> iful at http://zipclue.com
>
>
>
>



-- 
---
tobeThink!
www.tobethink.com

Aligning IT and Education

>> 021-99325243
Y! : hawking_123
Linkedln : http://www.linkedin.com/in/wildanmaulana

Re: Crawl News Web

Posted by Saurabh Bhutyani <sa...@in.com>.
 Hi Sjaiful Bahri, I don't find the recent news of last 23 days when I do a search on zipclue. What is the crawl frequency? Also are you storing and displaying the results from db? The search is quite slow. Original message From:Sjaiful Bahri< sbahri@rocketmail.com >Date: 07 Feb 09 09:50:11Subject:Re: Crawl News WebTo: nutchuser@lucene.apache.org, techcool.kumar@yahoo.com it's not related to RSS..  Cool The Breezerwrote:> Does it index all RSS feeds? >>>  On Tue, 1/27/09, Sjaiful Bahri >wrote: >> > From: Sjaiful Bahri> > Subject: Crawl News Web > > To: nutchuser@lucene.apache.org > > Date: Tuesday, January 27, 2009, 4:53 AM > > FYI, > > Zipclue is designed to crawl news information on > the > > web effectively and efficiently. > >> > http://zipclue.com > >> >> >> > Cheers> > iful at http://zipclue.com >>>>Cheersiful at http://zipclue.com

Re: Crawl News Web

Posted by Sjaiful Bahri <sb...@rocketmail.com>.
it's not related to RSS.. 

--- Cool The Breezer <te...@yahoo.com> wrote:

> Does it index all RSS feeds?
> 
> 
> --- On Tue, 1/27/09, Sjaiful Bahri
> <sb...@rocketmail.com> wrote:
> 
> > From: Sjaiful Bahri <sb...@rocketmail.com>
> > Subject: Crawl News Web
> > To: nutch-user@lucene.apache.org
> > Date: Tuesday, January 27, 2009, 4:53 AM
> > FYI,
> > Zipclue is designed to crawl news information on
> the
> > web effectively and efficiently.
> > 
> > http://zipclue.com
> > 
> > 
> > 
> > Cheers 
> > iful at http://zipclue.com
> 
> 
>       
> 


Cheers 
iful at http://zipclue.com


      

Re: Crawl News Web

Posted by Cool The Breezer <te...@yahoo.com>.
Does it index all RSS feeds?


--- On Tue, 1/27/09, Sjaiful Bahri <sb...@rocketmail.com> wrote:

> From: Sjaiful Bahri <sb...@rocketmail.com>
> Subject: Crawl News Web
> To: nutch-user@lucene.apache.org
> Date: Tuesday, January 27, 2009, 4:53 AM
> FYI,
> Zipclue is designed to crawl news information on the
> web effectively and efficiently.
> 
> http://zipclue.com
> 
> 
> 
> Cheers 
> iful at http://zipclue.com