You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by affi <sw...@hotmail.com> on 2013/10/07 23:24:37 UTC

crawl webpages into couchdb

hi ,
i am a beginner at couchdb and am learning it for a uni project. i have 
watched many tutorials on JSON and understand how to add documents. but i dont 
understand how to crawl webpages and store them in the couchdb database. 
would definitely appreciate some help with this. thanks


Re: crawl webpages into couchdb

Posted by Jens Rantil <je...@gmail.com>.
Hi Affi,

Your question is very unspecific. We have no idea what kind of information
you are going to crawl from websites and you need to decide that for
yourself (or find a forum that's more specific to crawling). If you tell us
what information you have, and how you would like to query it we might be
able to help you.

Cheers,
Jens


On Mon, Oct 7, 2013 at 11:24 PM, affi <sw...@hotmail.com> wrote:

> hi ,
> i am a beginner at couchdb and am learning it for a uni project. i have
> watched many tutorials on JSON and understand how to add documents. but i
> dont
> understand how to crawl webpages and store them in the couchdb database.
> would definitely appreciate some help with this. thanks
>
>

Re: crawl webpages into couchdb

Posted by Jens Rantil <je...@gmail.com>.
They seem to not be. There's been a couple of double posts lately...

Den onsdagen den 9:e oktober 2013 skrev Mark Deibert:

> @Stanley: Ambiguous? Definitely some :-) How are these mailing lists
> moderated? Or are they not?
>
>
> On Wed, Oct 9, 2013 at 2:57 PM, Stanley Iriele <siriele2x3@gmail.com<javascript:;>>
> wrote:
>
> > We're in a "I use couch db and I have q question group"... Which can q
> > little ambiguous at tines
> > On Oct 9, 2013 11:26 AM, "Mark Deibert" <mark.deibert@gmail.com<javascript:;>>
> wrote:
> >
> > > Are we in a CouchDB group or a "web crawler" apps group? :-/
> > >
> > >
> > > On Wed, Oct 9, 2013 at 2:09 PM, Chad Cross <chadcross@gmail.com<javascript:;>>
> wrote:
> > >
> > > > Affi,
> > > >
> > > > CouchDB doesn't natively solve the web crawling issue.  I'm currently
> > > > experimenting with Scrapy (http://scrapy.org) for web crawling, but
> I
> > > > haven't advanced enough to start pushing my crawling data into
> CouchDB.
> > > >  Maybe some users out there have experience with Scrapy and CouchDB?
> > > >
> > > > -Chad
> > > >
> > > >
> > > > On Wed, Oct 9, 2013 at 1:45 PM, Brad Rhoads <bdrhoa@gmail.com<javascript:;>>
> wrote:
> > > >
> > > > > Or better yet, casperjs.
> > > > > On Oct 7, 2013 3:37 PM, "Mark Hahn" <mark@reevuit.com<javascript:;>>
> wrote:
> > > > >
> > > > > > Use node, phantomjs, and the nano couchdb driver.
> > > > > >
> > > > > >
> > > > > > On Mon, Oct 7, 2013 at 2:24 PM, affi <sw3et.poison@hotmail.com<javascript:;>
> >
> > > wrote:
> > > > > >
> > > > > > > hi ,
> > > > > > > i am a beginner at couchdb and am learning it for a uni
> project.
> > i
> > > > have
> > > > > > > watched many tutorials on JSON and understand how to add
> > documents.
> > > > > but i
> > > > > > > dont
> > > > > > > understand how to crawl webpages and store them in the couchdb
> > > > > database.
> > > > > > > would definitely appreciate some help with this. thanks
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: crawl webpages into couchdb

Posted by Mark Deibert <ma...@gmail.com>.
@Stanley: Ambiguous? Definitely some :-) How are these mailing lists
moderated? Or are they not?


On Wed, Oct 9, 2013 at 2:57 PM, Stanley Iriele <si...@gmail.com> wrote:

> We're in a "I use couch db and I have q question group"... Which can q
> little ambiguous at tines
> On Oct 9, 2013 11:26 AM, "Mark Deibert" <ma...@gmail.com> wrote:
>
> > Are we in a CouchDB group or a "web crawler" apps group? :-/
> >
> >
> > On Wed, Oct 9, 2013 at 2:09 PM, Chad Cross <ch...@gmail.com> wrote:
> >
> > > Affi,
> > >
> > > CouchDB doesn't natively solve the web crawling issue.  I'm currently
> > > experimenting with Scrapy (http://scrapy.org) for web crawling, but I
> > > haven't advanced enough to start pushing my crawling data into CouchDB.
> > >  Maybe some users out there have experience with Scrapy and CouchDB?
> > >
> > > -Chad
> > >
> > >
> > > On Wed, Oct 9, 2013 at 1:45 PM, Brad Rhoads <bd...@gmail.com> wrote:
> > >
> > > > Or better yet, casperjs.
> > > > On Oct 7, 2013 3:37 PM, "Mark Hahn" <ma...@reevuit.com> wrote:
> > > >
> > > > > Use node, phantomjs, and the nano couchdb driver.
> > > > >
> > > > >
> > > > > On Mon, Oct 7, 2013 at 2:24 PM, affi <sw...@hotmail.com>
> > wrote:
> > > > >
> > > > > > hi ,
> > > > > > i am a beginner at couchdb and am learning it for a uni project.
> i
> > > have
> > > > > > watched many tutorials on JSON and understand how to add
> documents.
> > > > but i
> > > > > > dont
> > > > > > understand how to crawl webpages and store them in the couchdb
> > > > database.
> > > > > > would definitely appreciate some help with this. thanks
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: crawl webpages into couchdb

Posted by Stanley Iriele <si...@gmail.com>.
We're in a "I use couch db and I have q question group"... Which can q
little ambiguous at tines
On Oct 9, 2013 11:26 AM, "Mark Deibert" <ma...@gmail.com> wrote:

> Are we in a CouchDB group or a "web crawler" apps group? :-/
>
>
> On Wed, Oct 9, 2013 at 2:09 PM, Chad Cross <ch...@gmail.com> wrote:
>
> > Affi,
> >
> > CouchDB doesn't natively solve the web crawling issue.  I'm currently
> > experimenting with Scrapy (http://scrapy.org) for web crawling, but I
> > haven't advanced enough to start pushing my crawling data into CouchDB.
> >  Maybe some users out there have experience with Scrapy and CouchDB?
> >
> > -Chad
> >
> >
> > On Wed, Oct 9, 2013 at 1:45 PM, Brad Rhoads <bd...@gmail.com> wrote:
> >
> > > Or better yet, casperjs.
> > > On Oct 7, 2013 3:37 PM, "Mark Hahn" <ma...@reevuit.com> wrote:
> > >
> > > > Use node, phantomjs, and the nano couchdb driver.
> > > >
> > > >
> > > > On Mon, Oct 7, 2013 at 2:24 PM, affi <sw...@hotmail.com>
> wrote:
> > > >
> > > > > hi ,
> > > > > i am a beginner at couchdb and am learning it for a uni project. i
> > have
> > > > > watched many tutorials on JSON and understand how to add documents.
> > > but i
> > > > > dont
> > > > > understand how to crawl webpages and store them in the couchdb
> > > database.
> > > > > would definitely appreciate some help with this. thanks
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: crawl webpages into couchdb

Posted by Mark Deibert <ma...@gmail.com>.
Are we in a CouchDB group or a "web crawler" apps group? :-/


On Wed, Oct 9, 2013 at 2:09 PM, Chad Cross <ch...@gmail.com> wrote:

> Affi,
>
> CouchDB doesn't natively solve the web crawling issue.  I'm currently
> experimenting with Scrapy (http://scrapy.org) for web crawling, but I
> haven't advanced enough to start pushing my crawling data into CouchDB.
>  Maybe some users out there have experience with Scrapy and CouchDB?
>
> -Chad
>
>
> On Wed, Oct 9, 2013 at 1:45 PM, Brad Rhoads <bd...@gmail.com> wrote:
>
> > Or better yet, casperjs.
> > On Oct 7, 2013 3:37 PM, "Mark Hahn" <ma...@reevuit.com> wrote:
> >
> > > Use node, phantomjs, and the nano couchdb driver.
> > >
> > >
> > > On Mon, Oct 7, 2013 at 2:24 PM, affi <sw...@hotmail.com> wrote:
> > >
> > > > hi ,
> > > > i am a beginner at couchdb and am learning it for a uni project. i
> have
> > > > watched many tutorials on JSON and understand how to add documents.
> > but i
> > > > dont
> > > > understand how to crawl webpages and store them in the couchdb
> > database.
> > > > would definitely appreciate some help with this. thanks
> > > >
> > > >
> > >
> >
>

Re: crawl webpages into couchdb

Posted by Chad Cross <ch...@gmail.com>.
Affi,

CouchDB doesn't natively solve the web crawling issue.  I'm currently
experimenting with Scrapy (http://scrapy.org) for web crawling, but I
haven't advanced enough to start pushing my crawling data into CouchDB.
 Maybe some users out there have experience with Scrapy and CouchDB?

-Chad


On Wed, Oct 9, 2013 at 1:45 PM, Brad Rhoads <bd...@gmail.com> wrote:

> Or better yet, casperjs.
> On Oct 7, 2013 3:37 PM, "Mark Hahn" <ma...@reevuit.com> wrote:
>
> > Use node, phantomjs, and the nano couchdb driver.
> >
> >
> > On Mon, Oct 7, 2013 at 2:24 PM, affi <sw...@hotmail.com> wrote:
> >
> > > hi ,
> > > i am a beginner at couchdb and am learning it for a uni project. i have
> > > watched many tutorials on JSON and understand how to add documents.
> but i
> > > dont
> > > understand how to crawl webpages and store them in the couchdb
> database.
> > > would definitely appreciate some help with this. thanks
> > >
> > >
> >
>

Re: crawl webpages into couchdb

Posted by Brad Rhoads <bd...@gmail.com>.
Or better yet, casperjs.
On Oct 7, 2013 3:37 PM, "Mark Hahn" <ma...@reevuit.com> wrote:

> Use node, phantomjs, and the nano couchdb driver.
>
>
> On Mon, Oct 7, 2013 at 2:24 PM, affi <sw...@hotmail.com> wrote:
>
> > hi ,
> > i am a beginner at couchdb and am learning it for a uni project. i have
> > watched many tutorials on JSON and understand how to add documents. but i
> > dont
> > understand how to crawl webpages and store them in the couchdb database.
> > would definitely appreciate some help with this. thanks
> >
> >
>

Re: crawl webpages into couchdb

Posted by Mark Hahn <ma...@reevuit.com>.
Use node, phantomjs, and the nano couchdb driver.


On Mon, Oct 7, 2013 at 2:24 PM, affi <sw...@hotmail.com> wrote:

> hi ,
> i am a beginner at couchdb and am learning it for a uni project. i have
> watched many tutorials on JSON and understand how to add documents. but i
> dont
> understand how to crawl webpages and store them in the couchdb database.
> would definitely appreciate some help with this. thanks
>
>