You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Sjaiful Bahri <sb...@rocketmail.com> on 2009/01/27 10:53:44 UTC
Crawl News Web
FYI,
Zipclue is designed to crawl news information on the
web effectively and efficiently.
http://zipclue.com
Cheers
iful at http://zipclue.com
Re: Crawl News Web
Posted by W <wi...@gmail.com>.
I don't know .., ask the creator of zipclue, sjaiful bachri.
On Thu, Feb 12, 2009 at 12:39 PM, Saurabh Bhutyani <sa...@in.com> wrote:
> Hi Wildn, I don't find the recent news of last 23 days when I do a search on zipclue. What is the crawl frequency? Also are you storing and displaying the results from db? The search is quite slow. Original message From:Sjaiful Bahri< sbahri@rocketmail.com >Date: 29 Jan 09 10:51:36Subject:Re: Crawl News WebTo: nutchuser@lucene.apache.orgHello Wildan,This is the process to crawl
--
---
OpenThink Labs
www.tobethink.com
Aligning IT and Education
>> 021-99325243
Y! : hawking_123
Linkedln : http://www.linkedin.com/in/wildanmaulana
Re: Crawl News Web
Posted by Saurabh Bhutyani <sa...@in.com>.
Hi Wildn, I don't find the recent news of last 23 days when I do a search on zipclue. What is the crawl frequency? Also are you storing and displaying the results from db? The search is quite slow. Original message From:Sjaiful Bahri< sbahri@rocketmail.com >Date: 29 Jan 09 10:51:36Subject:Re: Crawl News WebTo: nutchuser@lucene.apache.orgHello Wildan,This is the process to crawl news site:1. Deep First search to identify news site2. Crawl process using regular expression3. Save result contents into database4. Users ready to find News through database Wwrote:> Can you share the architecture do you use ? are you > using nutch also > for the backend ? >>> Regards, > Wildan >> On Tue, Jan 27, 2009 at 4:53 PM, Sjaiful Bahri >wrote: > > FYI, > > Zipclue is designed to crawl news information on > the > > web effectively and efficiently. > > > > http://zipclue.com > > > > > > > > Cheers > > iful at http://zipclue.com > > > > > > > > >>>> > > tobeThink! > www.tobethink.com >> Alignin
g IT and Education >> >> 02199325243 > Y! : hawking123 > Linkedln : http://www.linkedin.com/in/wildanmaulana >Cheersiful at http://zipclue.com
Re: Crawl News Web
Posted by Sjaiful Bahri <sb...@rocketmail.com>.
Hello Wildan,
This is the process to crawl news site:
1. Deep First search to identify news site
2. Crawl process using regular expression
3. Save result contents into database
4. Users ready to find News through database
--- W <wi...@gmail.com> wrote:
> Can you share the architecture do you use ? are you
> using nutch also
> for the backend ?
>
>
> Regards,
> Wildan
>
> On Tue, Jan 27, 2009 at 4:53 PM, Sjaiful Bahri
> <sb...@rocketmail.com> wrote:
> > FYI,
> > Zipclue is designed to crawl news information on
> the
> > web effectively and efficiently.
> >
> > http://zipclue.com
> >
> >
> >
> > Cheers
> > iful at http://zipclue.com
> >
> >
> >
> >
>
>
>
> --
> ---
> tobeThink!
> www.tobethink.com
>
> Aligning IT and Education
>
> >> 021-99325243
> Y! : hawking_123
> Linkedln : http://www.linkedin.com/in/wildanmaulana
>
Cheers
iful at http://zipclue.com
Re: Crawl News Web
Posted by W <wi...@gmail.com>.
Can you share the architecture do you use ? are you using nutch also
for the backend ?
Regards,
Wildan
On Tue, Jan 27, 2009 at 4:53 PM, Sjaiful Bahri <sb...@rocketmail.com> wrote:
> FYI,
> Zipclue is designed to crawl news information on the
> web effectively and efficiently.
>
> http://zipclue.com
>
>
>
> Cheers
> iful at http://zipclue.com
>
>
>
>
--
---
tobeThink!
www.tobethink.com
Aligning IT and Education
>> 021-99325243
Y! : hawking_123
Linkedln : http://www.linkedin.com/in/wildanmaulana
Re: Crawl News Web
Posted by Saurabh Bhutyani <sa...@in.com>.
Hi Sjaiful Bahri, I don't find the recent news of last 23 days when I do a search on zipclue. What is the crawl frequency? Also are you storing and displaying the results from db? The search is quite slow. Original message From:Sjaiful Bahri< sbahri@rocketmail.com >Date: 07 Feb 09 09:50:11Subject:Re: Crawl News WebTo: nutchuser@lucene.apache.org, techcool.kumar@yahoo.com it's not related to RSS.. Cool The Breezerwrote:> Does it index all RSS feeds? >>> On Tue, 1/27/09, Sjaiful Bahri >wrote: >> > From: Sjaiful Bahri> > Subject: Crawl News Web > > To: nutchuser@lucene.apache.org > > Date: Tuesday, January 27, 2009, 4:53 AM > > FYI, > > Zipclue is designed to crawl news information on > the > > web effectively and efficiently. > >> > http://zipclue.com > >> >> >> > Cheers> > iful at http://zipclue.com >>>>Cheersiful at http://zipclue.com
Re: Crawl News Web
Posted by Sjaiful Bahri <sb...@rocketmail.com>.
it's not related to RSS..
--- Cool The Breezer <te...@yahoo.com> wrote:
> Does it index all RSS feeds?
>
>
> --- On Tue, 1/27/09, Sjaiful Bahri
> <sb...@rocketmail.com> wrote:
>
> > From: Sjaiful Bahri <sb...@rocketmail.com>
> > Subject: Crawl News Web
> > To: nutch-user@lucene.apache.org
> > Date: Tuesday, January 27, 2009, 4:53 AM
> > FYI,
> > Zipclue is designed to crawl news information on
> the
> > web effectively and efficiently.
> >
> > http://zipclue.com
> >
> >
> >
> > Cheers
> > iful at http://zipclue.com
>
>
>
>
Cheers
iful at http://zipclue.com
Re: Crawl News Web
Posted by Cool The Breezer <te...@yahoo.com>.
Does it index all RSS feeds?
--- On Tue, 1/27/09, Sjaiful Bahri <sb...@rocketmail.com> wrote:
> From: Sjaiful Bahri <sb...@rocketmail.com>
> Subject: Crawl News Web
> To: nutch-user@lucene.apache.org
> Date: Tuesday, January 27, 2009, 4:53 AM
> FYI,
> Zipclue is designed to crawl news information on the
> web effectively and efficiently.
>
> http://zipclue.com
>
>
>
> Cheers
> iful at http://zipclue.com