You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Ratnesh,V2Solutions India" <ra...@in.v2solutions.com> on 2007/05/09 15:29:36 UTC

how to update CrawlDB instead of Recrawling???

Hi,
Ricardo, Greetings of the day,
We are using nutch and our corporate application is ready but due to client
demand regarding getting refresh crawl data, we are planning to update our
crawldb instead of re-crawling .

So do u have any solution that how to update crawldb which already have been
crawled and storing some useful information.

It's nice if I find any solutions from u or any of ur colleagues.

With Thanks & Regards,

Ratnesh,V2Solutions India

-- 
View this message in context: http://www.nabble.com/how-to-update-CrawlDB-instead-of-Recrawling----tf3715747.html#a10394243
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: how to update CrawlDB instead of Recrawling???

Posted by srampl <se...@gmail.com>.
hi,

Ok, Plz give idea abt this is as soon as possible,.i am waiting for ur
reply, 

Thanks in advance




Ratnesh,V2Solutions India wrote:
> 
> hi, 
> one of my friend Harmesh has positively worked on this issue, and he will
> be writing you soon regarding the problems you are facing.
> 
> 
> 
> srampl wrote:
>> 
>> hi,
>> 
>>  You Find a solution for that, i have the same problem like this...
>> If you got idea about this,, Plz tell me how its possible....
>> 
>> I am waiting for ur reply,
>> 
>> Thanks in advance
>> 
>> regards,
>> sram
>> 
>> 
>> Ratnesh,V2Solutions India wrote:
>>> 
>>> Hi,
>>> Ricardo, Greetings of the day,
>>> We are using nutch and our corporate application is ready but due to
>>> client demand regarding getting refresh crawl data, we are planning to
>>> update our crawldb instead of re-crawling .
>>> 
>>> So do u have any solution that how to update crawldb which already have
>>> been crawled and storing some useful information.
>>> 
>>> It's nice if I find any solutions from u or any of ur colleagues.
>>> 
>>> With Thanks & Regards,
>>> 
>>> Ratnesh,V2Solutions India
>>> 
>>> 
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/how-to-update-CrawlDB-instead-of-Recrawling----tf3715747.html#a12086541
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: how to update CrawlDB instead of Recrawling???

Posted by "Ratnesh,V2Solutions India" <ra...@in.v2solutions.com>.
hi, 
one of my friend Harmesh has positively worked on this issue, and he will be
writing you soon regarding the problems you are facing.



srampl wrote:
> 
> hi,
> 
>  You Find a solution for that, i have the same problem like this...
> If you got idea about this,, Plz tell me how its possible....
> 
> I am waiting for ur reply,
> 
> Thanks in advance
> 
> regards,
> sram
> 
> 
> Ratnesh,V2Solutions India wrote:
>> 
>> Hi,
>> Ricardo, Greetings of the day,
>> We are using nutch and our corporate application is ready but due to
>> client demand regarding getting refresh crawl data, we are planning to
>> update our crawldb instead of re-crawling .
>> 
>> So do u have any solution that how to update crawldb which already have
>> been crawled and storing some useful information.
>> 
>> It's nice if I find any solutions from u or any of ur colleagues.
>> 
>> With Thanks & Regards,
>> 
>> Ratnesh,V2Solutions India
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/how-to-update-CrawlDB-instead-of-Recrawling----tf3715747.html#a12086283
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: how to update CrawlDB instead of Recrawling???

Posted by srampl <se...@gmail.com>.
hi,

 You Find a solution for that, i have the same problem like this...
If you got idea about this,, Plz tell me how its possible....

I am waiting for ur reply,

Thanks in advance

regards,
sram


Ratnesh,V2Solutions India wrote:
> 
> Hi,
> Ricardo, Greetings of the day,
> We are using nutch and our corporate application is ready but due to
> client demand regarding getting refresh crawl data, we are planning to
> update our crawldb instead of re-crawling .
> 
> So do u have any solution that how to update crawldb which already have
> been crawled and storing some useful information.
> 
> It's nice if I find any solutions from u or any of ur colleagues.
> 
> With Thanks & Regards,
> 
> Ratnesh,V2Solutions India
> 
> 

-- 
View this message in context: http://www.nabble.com/how-to-update-CrawlDB-instead-of-Recrawling----tf3715747.html#a12086056
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: how to update CrawlDB instead of Recrawling???

Posted by srampl <se...@gmail.com>.


Ratnesh,V2Solutions India wrote:
> 
> Hi,
> Ricardo, Greetings of the day,
> We are using nutch and our corporate application is ready but due to
> client demand regarding getting refresh crawl data, we are planning to
> update our crawldb instead of re-crawling .
> 
> So do u have any solution that how to update crawldb which already have
> been crawled and storing some useful information.
> 
> It's nice if I find any solutions from u or any of ur colleagues.
> 
> With Thanks & Regards,
> 
> Ratnesh,V2Solutions India
> 
> 

-- 
View this message in context: http://www.nabble.com/how-to-update-CrawlDB-instead-of-Recrawling----tf3715747.html#a12086058
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: how to update CrawlDB instead of Recrawling???

Posted by Brian Demers <br...@gmail.com>.
does anyone know of a nicer way of doing this?



On 8/13/07, Renaud Richardet <re...@apache.org> wrote:
> not sure, but I think it's just to flush the cached index...
>
>
> Brian Demers wrote:
> > Why does the web app need to be restarted? are the index files on the
> > classpath or something? It seem like this is a hack?
> >
> >
> > On 8/13/07, srampl <se...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> Thanks for this valuable information,
> >>
> >> I need contionus and latest results in nutch,i have old crawl data "CrawlA"
> >> and latest crawl data "crawlB" . u told after the merge use this command
> >> "touch $tomcat_dir/WEB-INF/web.xml" function in script, that's fine. but at
> >> the time of merging between crawlA & crawlB, we can't able to give the
> >> result. it disply empty page, then only i am asking ,  how to solve this
> >> problem.
> >>
> >> Thanks
> >>
> >>
> >>
> >>
> >> Tomislav Poljak wrote:
> >>
> >>> Hi,
> >>> if it helps:
> >>>
> >>> you don't need to restart tomcat to load index changes, it is enough to
> >>> restart an individual web application (without restarting the Tomcat
> >>> service) by touching the application's web.xml file. This is faster than
> >>> restarting tomcat. Add:
> >>>
> >>> touch $tomcat_dir/WEB-INF/web.xml
> >>>
> >>> to the end of your script and this will "tell Tomcat to reload index".
> >>>
> >>> Tomislav
> >>>
> >>>
> >>>
> >>> On Fri, 2007-08-10 at 02:50 -0700, srampl wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> Thanks for ur reply,
> >>>>
> >>>> I have did that step, but at the time of merging between old index(i.e
> >>>> curently tomcat running index) and new index or after merging, it not
> >>>> give
> >>>> the search result until the tomcat is restart. So we can't able to
> >>>> produce
> >>>> contionous search result at the the time of merging and after merging
> >>>> until
> >>>> we restart the tomcat.
> >>>>
> >>>> Plz give idea abt this
> >>>>
> >>>> Thanks in advance,
> >>>>
> >>>>
> >>>>
> >>>> Harmesh, V2solutions wrote:
> >>>>
> >>>>> Hi,
> >>>>>  The Crawl can be updated by again performing the generate, fetch &
> >>>>>
> >>>> update
> >>>>
> >>>>> cycle step by step
> >>>>> Generate will create new segment and after fetching the documents, the
> >>>>> update cycle will update it
> >>>>> with the older crawl.
> >>>>>
> >>>>>
> >>>>> Ratnesh,V2Solutions India wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>> Ricardo, Greetings of the day,
> >>>>>> We are using nutch and our corporate application is ready but due to
> >>>>>> client demand regarding getting refresh crawl data, we are planning to
> >>>>>> update our crawldb instead of re-crawling .
> >>>>>>
> >>>>>> So do u have any solution that how to update crawldb which already
> >>>>>>
> >>>> have
> >>>>
> >>>>>> been crawled and storing some useful information.
> >>>>>>
> >>>>>> It's nice if I find any solutions from u or any of ur colleagues.
> >>>>>>
> >>>>>> With Thanks & Regards,
> >>>>>>
> >>>>>> Ratnesh,V2Solutions India
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
>
>

Re: how to update CrawlDB instead of Recrawling???

Posted by Renaud Richardet <re...@apache.org>.
not sure, but I think it's just to flush the cached index...


Brian Demers wrote:
> Why does the web app need to be restarted? are the index files on the
> classpath or something? It seem like this is a hack?
>
>
> On 8/13/07, srampl <se...@gmail.com> wrote:
>   
>> Hi,
>>
>> Thanks for this valuable information,
>>
>> I need contionus and latest results in nutch,i have old crawl data "CrawlA"
>> and latest crawl data "crawlB" . u told after the merge use this command
>> "touch $tomcat_dir/WEB-INF/web.xml" function in script, that's fine. but at
>> the time of merging between crawlA & crawlB, we can't able to give the
>> result. it disply empty page, then only i am asking ,  how to solve this
>> problem.
>>
>> Thanks
>>
>>
>>
>>
>> Tomislav Poljak wrote:
>>     
>>> Hi,
>>> if it helps:
>>>
>>> you don't need to restart tomcat to load index changes, it is enough to
>>> restart an individual web application (without restarting the Tomcat
>>> service) by touching the application's web.xml file. This is faster than
>>> restarting tomcat. Add:
>>>
>>> touch $tomcat_dir/WEB-INF/web.xml
>>>
>>> to the end of your script and this will "tell Tomcat to reload index".
>>>
>>> Tomislav
>>>
>>>
>>>
>>> On Fri, 2007-08-10 at 02:50 -0700, srampl wrote:
>>>       
>>>> Hi,
>>>>
>>>> Thanks for ur reply,
>>>>
>>>> I have did that step, but at the time of merging between old index(i.e
>>>> curently tomcat running index) and new index or after merging, it not
>>>> give
>>>> the search result until the tomcat is restart. So we can't able to
>>>> produce
>>>> contionous search result at the the time of merging and after merging
>>>> until
>>>> we restart the tomcat.
>>>>
>>>> Plz give idea abt this
>>>>
>>>> Thanks in advance,
>>>>
>>>>
>>>>
>>>> Harmesh, V2solutions wrote:
>>>>         
>>>>> Hi,
>>>>>  The Crawl can be updated by again performing the generate, fetch &
>>>>>           
>>>> update
>>>>         
>>>>> cycle step by step
>>>>> Generate will create new segment and after fetching the documents, the
>>>>> update cycle will update it
>>>>> with the older crawl.
>>>>>
>>>>>
>>>>> Ratnesh,V2Solutions India wrote:
>>>>>           
>>>>>> Hi,
>>>>>> Ricardo, Greetings of the day,
>>>>>> We are using nutch and our corporate application is ready but due to
>>>>>> client demand regarding getting refresh crawl data, we are planning to
>>>>>> update our crawldb instead of re-crawling .
>>>>>>
>>>>>> So do u have any solution that how to update crawldb which already
>>>>>>             
>>>> have
>>>>         
>>>>>> been crawled and storing some useful information.
>>>>>>
>>>>>> It's nice if I find any solutions from u or any of ur colleagues.
>>>>>>
>>>>>> With Thanks & Regards,
>>>>>>
>>>>>> Ratnesh,V2Solutions India
>>>>>>
>>>>>>
>>>>>>             
>>>>>           


Re: how to update CrawlDB instead of Recrawling???

Posted by Brian Demers <br...@gmail.com>.
Why does the web app need to be restarted? are the index files on the
classpath or something? It seem like this is a hack?


On 8/13/07, srampl <se...@gmail.com> wrote:
>
> Hi,
>
> Thanks for this valuable information,
>
> I need contionus and latest results in nutch,i have old crawl data "CrawlA"
> and latest crawl data "crawlB" . u told after the merge use this command
> "touch $tomcat_dir/WEB-INF/web.xml" function in script, that's fine. but at
> the time of merging between crawlA & crawlB, we can't able to give the
> result. it disply empty page, then only i am asking ,  how to solve this
> problem.
>
> Thanks
>
>
>
>
> Tomislav Poljak wrote:
> >
> > Hi,
> > if it helps:
> >
> > you don't need to restart tomcat to load index changes, it is enough to
> > restart an individual web application (without restarting the Tomcat
> > service) by touching the application's web.xml file. This is faster than
> > restarting tomcat. Add:
> >
> > touch $tomcat_dir/WEB-INF/web.xml
> >
> > to the end of your script and this will "tell Tomcat to reload index".
> >
> > Tomislav
> >
> >
> >
> > On Fri, 2007-08-10 at 02:50 -0700, srampl wrote:
> >> Hi,
> >>
> >> Thanks for ur reply,
> >>
> >> I have did that step, but at the time of merging between old index(i.e
> >> curently tomcat running index) and new index or after merging, it not
> >> give
> >> the search result until the tomcat is restart. So we can't able to
> >> produce
> >> contionous search result at the the time of merging and after merging
> >> until
> >> we restart the tomcat.
> >>
> >> Plz give idea abt this
> >>
> >> Thanks in advance,
> >>
> >>
> >>
> >> Harmesh, V2solutions wrote:
> >> >
> >> > Hi,
> >> >  The Crawl can be updated by again performing the generate, fetch &
> >> update
> >> > cycle step by step
> >> > Generate will create new segment and after fetching the documents, the
> >> > update cycle will update it
> >> > with the older crawl.
> >> >
> >> >
> >> > Ratnesh,V2Solutions India wrote:
> >> >>
> >> >> Hi,
> >> >> Ricardo, Greetings of the day,
> >> >> We are using nutch and our corporate application is ready but due to
> >> >> client demand regarding getting refresh crawl data, we are planning to
> >> >> update our crawldb instead of re-crawling .
> >> >>
> >> >> So do u have any solution that how to update crawldb which already
> >> have
> >> >> been crawled and storing some useful information.
> >> >>
> >> >> It's nice if I find any solutions from u or any of ur colleagues.
> >> >>
> >> >> With Thanks & Regards,
> >> >>
> >> >> Ratnesh,V2Solutions India
> >> >>
> >> >>
> >> >
> >> >
> >>
> >
> >
> >
>
> --
> View this message in context: http://www.nabble.com/how-to-update-CrawlDB-instead-of-Recrawling----tf3715747.html#a12122045
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>

Re: how to update CrawlDB instead of Recrawling???

Posted by srampl <se...@gmail.com>.
Hi,

Thanks for this valuable information,

I need contionus and latest results in nutch,i have old crawl data "CrawlA"
and latest crawl data "crawlB" . u told after the merge use this command
"touch $tomcat_dir/WEB-INF/web.xml" function in script, that's fine. but at
the time of merging between crawlA & crawlB, we can't able to give the
result. it disply empty page, then only i am asking ,  how to solve this
problem.

Thanks




Tomislav Poljak wrote:
> 
> Hi,
> if it helps:
> 
> you don't need to restart tomcat to load index changes, it is enough to
> restart an individual web application (without restarting the Tomcat
> service) by touching the application's web.xml file. This is faster than
> restarting tomcat. Add:
> 
> touch $tomcat_dir/WEB-INF/web.xml
> 
> to the end of your script and this will "tell Tomcat to reload index".
> 
> Tomislav
> 
> 
> 
> On Fri, 2007-08-10 at 02:50 -0700, srampl wrote:
>> Hi,
>> 
>> Thanks for ur reply,
>> 
>> I have did that step, but at the time of merging between old index(i.e
>> curently tomcat running index) and new index or after merging, it not
>> give
>> the search result until the tomcat is restart. So we can't able to
>> produce
>> contionous search result at the the time of merging and after merging
>> until
>> we restart the tomcat.
>> 
>> Plz give idea abt this
>> 
>> Thanks in advance,
>> 
>> 
>> 
>> Harmesh, V2solutions wrote:
>> > 
>> > Hi,
>> >  The Crawl can be updated by again performing the generate, fetch &
>> update
>> > cycle step by step
>> > Generate will create new segment and after fetching the documents, the
>> > update cycle will update it 
>> > with the older crawl.
>> > 
>> > 
>> > Ratnesh,V2Solutions India wrote:
>> >> 
>> >> Hi,
>> >> Ricardo, Greetings of the day,
>> >> We are using nutch and our corporate application is ready but due to
>> >> client demand regarding getting refresh crawl data, we are planning to
>> >> update our crawldb instead of re-crawling .
>> >> 
>> >> So do u have any solution that how to update crawldb which already
>> have
>> >> been crawled and storing some useful information.
>> >> 
>> >> It's nice if I find any solutions from u or any of ur colleagues.
>> >> 
>> >> With Thanks & Regards,
>> >> 
>> >> Ratnesh,V2Solutions India
>> >> 
>> >> 
>> > 
>> > 
>> 
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/how-to-update-CrawlDB-instead-of-Recrawling----tf3715747.html#a12122045
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: how to update CrawlDB instead of Recrawling???

Posted by Tomislav Poljak <tp...@gmail.com>.
Hi,
if it helps:

you don't need to restart tomcat to load index changes, it is enough to
restart an individual web application (without restarting the Tomcat
service) by touching the application's web.xml file. This is faster than
restarting tomcat. Add:

touch $tomcat_dir/WEB-INF/web.xml

to the end of your script and this will "tell Tomcat to reload index".

Tomislav



On Fri, 2007-08-10 at 02:50 -0700, srampl wrote:
> Hi,
> 
> Thanks for ur reply,
> 
> I have did that step, but at the time of merging between old index(i.e
> curently tomcat running index) and new index or after merging, it not give
> the search result until the tomcat is restart. So we can't able to produce
> contionous search result at the the time of merging and after merging until
> we restart the tomcat.
> 
> Plz give idea abt this
> 
> Thanks in advance,
> 
> 
> 
> Harmesh, V2solutions wrote:
> > 
> > Hi,
> >  The Crawl can be updated by again performing the generate, fetch & update
> > cycle step by step
> > Generate will create new segment and after fetching the documents, the
> > update cycle will update it 
> > with the older crawl.
> > 
> > 
> > Ratnesh,V2Solutions India wrote:
> >> 
> >> Hi,
> >> Ricardo, Greetings of the day,
> >> We are using nutch and our corporate application is ready but due to
> >> client demand regarding getting refresh crawl data, we are planning to
> >> update our crawldb instead of re-crawling .
> >> 
> >> So do u have any solution that how to update crawldb which already have
> >> been crawled and storing some useful information.
> >> 
> >> It's nice if I find any solutions from u or any of ur colleagues.
> >> 
> >> With Thanks & Regards,
> >> 
> >> Ratnesh,V2Solutions India
> >> 
> >> 
> > 
> > 
> 


Re: how to update CrawlDB instead of Recrawling???

Posted by srampl <se...@gmail.com>.

Hi,

Thanks for ur reply,

I have did that step, but at the time of merging between old index and new
index or after merging, it not give the search result until the tomcat is
restart. So we can't able to produce contionous search result at the the
time of merging and after merging until we restart the tomcat.

Plz give idea abt this

Thanks in advance,



Harmesh, V2solutions wrote:
> 
> Hi,
>  The Crawl can be updated by again performing the generate, fetch & update
> cycle step by step
> Generate will create new segment and after fetching the documents, the
> update cycle will update it 
> with the older crawl.
> 
> 
> Ratnesh,V2Solutions India wrote:
>> 
>> Hi,
>> Ricardo, Greetings of the day,
>> We are using nutch and our corporate application is ready but due to
>> client demand regarding getting refresh crawl data, we are planning to
>> update our crawldb instead of re-crawling .
>> 
>> So do u have any solution that how to update crawldb which already have
>> been crawled and storing some useful information.
>> 
>> It's nice if I find any solutions from u or any of ur colleagues.
>> 
>> With Thanks & Regards,
>> 
>> Ratnesh,V2Solutions India
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/how-to-update-CrawlDB-instead-of-Recrawling----tf3715747.html#a12088394
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: how to update CrawlDB instead of Recrawling???

Posted by srampl <se...@gmail.com>.
Hi,

Thanks for ur reply,

I have did that step, but at the time of merging between old index(i.e
curently tomcat running index) and new index or after merging, it not give
the search result until the tomcat is restart. So we can't able to produce
contionous search result at the the time of merging and after merging until
we restart the tomcat.

Plz give idea abt this

Thanks in advance,



Harmesh, V2solutions wrote:
> 
> Hi,
>  The Crawl can be updated by again performing the generate, fetch & update
> cycle step by step
> Generate will create new segment and after fetching the documents, the
> update cycle will update it 
> with the older crawl.
> 
> 
> Ratnesh,V2Solutions India wrote:
>> 
>> Hi,
>> Ricardo, Greetings of the day,
>> We are using nutch and our corporate application is ready but due to
>> client demand regarding getting refresh crawl data, we are planning to
>> update our crawldb instead of re-crawling .
>> 
>> So do u have any solution that how to update crawldb which already have
>> been crawled and storing some useful information.
>> 
>> It's nice if I find any solutions from u or any of ur colleagues.
>> 
>> With Thanks & Regards,
>> 
>> Ratnesh,V2Solutions India
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/how-to-update-CrawlDB-instead-of-Recrawling----tf3715747.html#a12088394
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: how to update CrawlDB instead of Recrawling???

Posted by "Harmesh, V2solutions" <ha...@in.v2solutions.com>.
Hi,
 The Crawl can be updated by again performing the generate, fetch & update
cycle step by step
Generate will create new segment and after fetching the documents, the
update cycle will update it 
with the older crawl.


Ratnesh,V2Solutions India wrote:
> 
> Hi,
> Ricardo, Greetings of the day,
> We are using nutch and our corporate application is ready but due to
> client demand regarding getting refresh crawl data, we are planning to
> update our crawldb instead of re-crawling .
> 
> So do u have any solution that how to update crawldb which already have
> been crawled and storing some useful information.
> 
> It's nice if I find any solutions from u or any of ur colleagues.
> 
> With Thanks & Regards,
> 
> Ratnesh,V2Solutions India
> 
> 

-- 
View this message in context: http://www.nabble.com/how-to-update-CrawlDB-instead-of-Recrawling----tf3715747.html#a12087687
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: how to update CrawlDB instead of Recrawling???

Posted by John Mendenhall <jo...@surfutopia.net>.
> http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.htm
> 
> The link above is not working...

Change the extension from htm to html and it works.

JohnM

> Naess, Ronny wrote:
> > 
> >  Take a look at this article
> > http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.htm
> > l
> > 
> > -Ronny
> > 
> > -----Opprinnelig melding-----
> > Fra: Ratnesh,V2Solutions India
> > [mailto:ratnesh.srivastava@in.v2solutions.com] 
> > Sendt: 9. mai 2007 15:30
> > Til: nutch-user@lucene.apache.org
> > Emne: how to update CrawlDB instead of Recrawling???
> > 
> > 
> > Hi,
> > Ricardo, Greetings of the day,
> > We are using nutch and our corporate application is ready but due to
> > client demand regarding getting refresh crawl data, we are planning to
> > update our crawldb instead of re-crawling .
> > 
> > So do u have any solution that how to update crawldb which already have
> > been crawled and storing some useful information.
> > 
> > It's nice if I find any solutions from u or any of ur colleagues.
> > 
> > With Thanks & Regards,
> > 
> > Ratnesh,V2Solutions India

-- 
john mendenhall
john@surfutopia.net
surf utopia
internet services

Re: how to update CrawlDB instead of Recrawling???

Posted by bikram <bi...@yahoo.com>.

hi..

http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.htm

The link above is not working...

instead i found the following link

http://wiki.apache.org/nutch/IntranetRecrawl?action=show#head-e58e25a0b9530bb6fcdfb282fd27a207fc0aff03

It covers both Nutch 8.0 and 9.0 INTRANET RECRAWLING..

Might be helpful for someone..

thanx
bikram



Naess, Ronny wrote:
> 
>  Take a look at this article
> http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.htm
> l
> 
> -Ronny
> 
> -----Opprinnelig melding-----
> Fra: Ratnesh,V2Solutions India
> [mailto:ratnesh.srivastava@in.v2solutions.com] 
> Sendt: 9. mai 2007 15:30
> Til: nutch-user@lucene.apache.org
> Emne: how to update CrawlDB instead of Recrawling???
> 
> 
> Hi,
> Ricardo, Greetings of the day,
> We are using nutch and our corporate application is ready but due to
> client demand regarding getting refresh crawl data, we are planning to
> update our crawldb instead of re-crawling .
> 
> So do u have any solution that how to update crawldb which already have
> been crawled and storing some useful information.
> 
> It's nice if I find any solutions from u or any of ur colleagues.
> 
> With Thanks & Regards,
> 
> Ratnesh,V2Solutions India
> 
> --
> View this message in context:
> http://www.nabble.com/how-to-update-CrawlDB-instead-of-Recrawling----tf3
> 715747.html#a10394243
> Sent from the Nutch - User mailing list archive at Nabble.com.
> 
> 
> !DSPAM:4641ccdf37681357017964!
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/how-to-update-CrawlDB-instead-of-Recrawling----tf3715747.html#a12233925
Sent from the Nutch - User mailing list archive at Nabble.com.