You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by ".: Abhishek :." <ab...@gmail.com> on 2011/02/01 02:35:40 UTC

Re: Index while crawling

Hi,

 I am unable to start Solr for the currently running crawl and when I try to
the below, I get messages saying the linkdb and segments do not exist in the
file system which is the true case.

 So how do I run solr in this case? or Do I have to run Solr seperately
instead of starting it from the nutch itself.

Thanks,
Abhi


On Mon, Jan 31, 2011 at 11:51 PM, .: Abhishek :. <ab...@gmail.com> wrote:

> Hi Alexander,
>
>  Thanks for the response. So I should be starting solr as follows,
>
> bin/nutch solrindex http://127.0.0.1:8080/solr/ crawl/crawldb crawl/linkdb
> crawl/segments/*
>
>  But while fetching we won't have segments right? So in this case how do I
> start Solr?
>
> Thanks,
> Abhi
>
>
> On Mon, Jan 31, 2011 at 7:30 PM, Alexander Aristov <
> alexander.aristov@gmail.com> wrote:
>
>> yes, you can but only if you use nutch + solr.
>>
>> If you use old nutchfrontend then you might brake index and searching
>> after
>> merging content or indexes.
>>
>> If you don't merge then search should work during crawling.
>>
>> but remember that results don't come available for searching immediately
>> after fetching. all pages must be fetched andf then indexed first to be
>> searchable.
>>
>> Best Regards
>> Alexander Aristov
>>
>>
>> On 31 January 2011 13:17, .: Abhishek :. <ab...@gmail.com> wrote:
>>
>> > Hi folks,
>> >
>> >  I should thank you all for the great help you have been offering so
>> far. I
>> > am learning about Nutch quite well.
>> >
>> >  One more beginners question here - Can I search for something while
>> nutch
>> > is still crawling an site? I believe this is not possible. However, why
>> I
>> > am
>> > asking this is - I am crawling a big site and  also the site is updated
>> > frequently with a lot of new pages, I just wanted to get some quick
>> results
>> > while its on the go.
>> >
>> > Thanks,
>> > Abhi
>> >
>>
>
>

Re: Index while crawling

Posted by ".: Abhishek :." <ab...@gmail.com>.

Hi all,

 I am kind of still having problems in figuring this out. I used the
instructions in the following URL,

http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/

 At the end what I see is only the search results from the seed urls that
are passed in. I think I am missing out something here, as per the tutorial
there is no where the depth or threads is specified. I feel that is why only
the seeds are showing up and no other pages are shown while searching in
admin screen of solr.

 Could you please let me know some pointers or advice on whats that I am
missing?

Thanks,
Abi

On Tue, Feb 1, 2011 at 6:25 PM, Markus Jelsma <ma...@openindex.io>wrote:

> Get your own fresh copy of Solr 1.4.1 (if you get one of the development
> versions you'll need to upgrade the Solr jar's in Nutch' lib). Unpack and
> find
> the example directory. In there you'll overwrite solr/conf/schema.xml with
> the
> one shipped with Nutch and you're good to go. Java -jar start.jar and it's
> running. I'd might also be a good idea to follow the tutorial first.
>
> > Hi,
> >
> >  I am unable to start Solr for the currently running crawl and when I try
> > to the below, I get messages saying the linkdb and segments do not exist
> > in the file system which is the true case.
> >
> >  So how do I run solr in this case? or Do I have to run Solr seperately
> > instead of starting it from the nutch itself.
> >
> > Thanks,
> > Abhi
> >
> > On Mon, Jan 31, 2011 at 11:51 PM, .: Abhishek :. <ab...@gmail.com>
> wrote:
> > > Hi Alexander,
> > >
> > >  Thanks for the response. So I should be starting solr as follows,
> > >
> > > bin/nutch solrindex http://127.0.0.1:8080/solr/ crawl/crawldb
> > > crawl/linkdb crawl/segments/*
> > >
> > >  But while fetching we won't have segments right? So in this case how
> do
> > >  I
> > >
> > > start Solr?
> > >
> > > Thanks,
> > > Abhi
> > >
> > >
> > > On Mon, Jan 31, 2011 at 7:30 PM, Alexander Aristov <
> > >
> > > alexander.aristov@gmail.com> wrote:
> > >> yes, you can but only if you use nutch + solr.
> > >>
> > >> If you use old nutchfrontend then you might brake index and searching
> > >> after
> > >> merging content or indexes.
> > >>
> > >> If you don't merge then search should work during crawling.
> > >>
> > >> but remember that results don't come available for searching
> immediately
> > >> after fetching. all pages must be fetched andf then indexed first to
> be
> > >> searchable.
> > >>
> > >> Best Regards
> > >> Alexander Aristov
> > >>
> > >> On 31 January 2011 13:17, .: Abhishek :. <ab...@gmail.com> wrote:
> > >> > Hi folks,
> > >> >
> > >> >  I should thank you all for the great help you have been offering so
> > >>
> > >> far. I
> > >>
> > >> > am learning about Nutch quite well.
> > >> >
> > >> >  One more beginners question here - Can I search for something while
> > >>
> > >> nutch
> > >>
> > >> > is still crawling an site? I believe this is not possible. However,
> > >> > why
> > >>
> > >> I
> > >>
> > >> > am
> > >> > asking this is - I am crawling a big site and  also the site is
> > >> > updated frequently with a lot of new pages, I just wanted to get
> some
> > >> > quick
> > >>
> > >> results
> > >>
> > >> > while its on the go.
> > >> >
> > >> > Thanks,
> > >> > Abhi
>

Re: Index while crawling

Posted by Markus Jelsma <ma...@openindex.io>.

Get your own fresh copy of Solr 1.4.1 (if you get one of the development 
versions you'll need to upgrade the Solr jar's in Nutch' lib). Unpack and find 
the example directory. In there you'll overwrite solr/conf/schema.xml with the 
one shipped with Nutch and you're good to go. Java -jar start.jar and it's 
running. I'd might also be a good idea to follow the tutorial first.

> Hi,
> 
>  I am unable to start Solr for the currently running crawl and when I try
> to the below, I get messages saying the linkdb and segments do not exist
> in the file system which is the true case.
> 
>  So how do I run solr in this case? or Do I have to run Solr seperately
> instead of starting it from the nutch itself.
> 
> Thanks,
> Abhi
> 
> On Mon, Jan 31, 2011 at 11:51 PM, .: Abhishek :. <ab...@gmail.com> wrote:
> > Hi Alexander,
> > 
> >  Thanks for the response. So I should be starting solr as follows,
> > 
> > bin/nutch solrindex http://127.0.0.1:8080/solr/ crawl/crawldb
> > crawl/linkdb crawl/segments/*
> > 
> >  But while fetching we won't have segments right? So in this case how do
> >  I
> > 
> > start Solr?
> > 
> > Thanks,
> > Abhi
> > 
> > 
> > On Mon, Jan 31, 2011 at 7:30 PM, Alexander Aristov <
> > 
> > alexander.aristov@gmail.com> wrote:
> >> yes, you can but only if you use nutch + solr.
> >> 
> >> If you use old nutchfrontend then you might brake index and searching
> >> after
> >> merging content or indexes.
> >> 
> >> If you don't merge then search should work during crawling.
> >> 
> >> but remember that results don't come available for searching immediately
> >> after fetching. all pages must be fetched andf then indexed first to be
> >> searchable.
> >> 
> >> Best Regards
> >> Alexander Aristov
> >> 
> >> On 31 January 2011 13:17, .: Abhishek :. <ab...@gmail.com> wrote:
> >> > Hi folks,
> >> > 
> >> >  I should thank you all for the great help you have been offering so
> >> 
> >> far. I
> >> 
> >> > am learning about Nutch quite well.
> >> > 
> >> >  One more beginners question here - Can I search for something while
> >> 
> >> nutch
> >> 
> >> > is still crawling an site? I believe this is not possible. However,
> >> > why
> >> 
> >> I
> >> 
> >> > am
> >> > asking this is - I am crawling a big site and  also the site is
> >> > updated frequently with a lot of new pages, I just wanted to get some
> >> > quick
> >> 
> >> results
> >> 
> >> > while its on the go.
> >> > 
> >> > Thanks,
> >> > Abhi