You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by ahmed ghouzia <gh...@yahoo.com> on 2006/05/26 15:15:54 UTC

Where exactly nutch scoring takes place ?

I want to use nutch as an environment to test my proposed algorithm for web mining

1- Where exactly does the nutch score take place ? in which packages or files?

2- Can the LinkAnalysisTool be run at the intranet level?, some documents mentioned that it can take place only at the whole web crawling level

3- what technologies and concepts that i must be familiar with to get into nuch development?
is it only jsp, servlet ro anything else ?

		
---------------------------------
Be a chatter box. Enjoy free PC-to-PC calls  with Yahoo! Messenger with Voice.

Re: Where exactly nutch scoring takes place ?

Posted by Stefan Groschupf <sg...@media-style.com>.
Hi,
> I want to use nutch as an environment to test my proposed algorithm  
> for web mining
>
> 1- Where exactly does the nutch score take place ? in which  
> packages or files?
Check the latest sources, there is a new Scoring API and a default  
plugin based implementation of OPIC.
>
> 2- Can the LinkAnalysisTool be run at the intranet level?, some  
> documents mentioned that it can take place only at the whole web  
> crawling level
That question is not clear to me, however you can easily hack the  
code to only process pages you are interested in.
Also there are other workaround as well, e.g. just fetch intranet  
pages if that works etc.
>
> 3- what technologies and concepts that i must be familiar with to  
> get into nuch development?

You should be able to write map reduce jobs and understand the hadoop  
io package if you want to do some custom analyzes.

HTH
Stefan


RE: NPE When using a merged segment

Posted by Gal Nitzan <gn...@usa.net>.
I was about to look into it, but wasn't sure which var was holding the new
segment name to replace with segment :) lucky for me you read this email...
:)



-----Original Message-----
From: Andrzej Bialecki [mailto:ab@getopt.org] 
Sent: Tuesday, May 30, 2006 6:31 PM
To: nutch-dev@lucene.apache.org
Subject: Re: NPE When using a merged segment

Gal Nitzan wrote:
> I think it is a bug. It saves the old segment name instead of replacing it
> with the new segment name
>
>   


I confirm, this is a bug - I forgot that Indexer relies on this metadata 
... I'll fix it in a moment - sorry for the trouble!

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com





Re: NPE When using a merged segment

Posted by Andrzej Bialecki <ab...@getopt.org>.
Gal Nitzan wrote:
> I think it is a bug. It saves the old segment name instead of replacing it
> with the new segment name
>
>   


I confirm, this is a bug - I forgot that Indexer relies on this metadata 
... I'll fix it in a moment - sorry for the trouble!

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



RE: NPE When using a merged segment

Posted by Gal Nitzan <gn...@usa.net>.
I think it is a bug. It saves the old segment name instead of replacing it
with the new segment name

-----Original Message-----
From: Dominik Friedrich [mailto:dominik@wipe-records.org] 
Sent: Monday, May 29, 2006 7:57 PM
To: nutch-dev@lucene.apache.org
Subject: Re: NPE When using a merged segment

I have the same problem with a merged segment. I had a look with luke at 
the index and it seems that the indexer puts the old segment names in 
there instead of the name of the merged segment. I'm not sure if I did 
something wrong or if this is a bug.

Dominik

Gal Nitzan schrieb:
> Hi,
>
> I have built a new index based on the new segment only.
>
>
>
> -----Original Message-----
> From: Stefan Neufeind [mailto:apache.org@stefan-neufeind.de] 
> Sent: Monday, May 29, 2006 10:03 AM
> To: nutch-dev@lucene.apache.org
> Subject: Re: NPE When using a merged segment
>
> Gal Nitzan wrote:
>   
>> Hi,
>>
>> After using mergesegs to merge all my segments to one segment only, I
>>     
> moved
>   
>> the new segment to segments.
>>
>> When accessing the web UI I get:
>>
>> java.lang.RuntimeException: java.lang.NullPointerException
>> 	
>>
>>     
>
org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.java:20
>   
>> 3)
>> 	org.apache.nutch.searcher.NutchBean.getSummary(NutchBean.java:329)
>> 	org.apache.jsp.search_jsp._jspService(org.apache.jsp.search_jsp:175)
>>     
>
> Hi,
>
> I'm not sure - but have you tried reindexing that new segment? To my
> understanding the index holds refereences to the segment (segment-name)
> - and in your case those are invalid. This would also explain the error
> you get (in call to getSummary) because the summary is fetched from the
> segment.
>
> If this works, then maybe you'll need to find a better way of cleaning
> up the index - not reindexing everything but maybe just rewriting the
> segmeent-names all into one or so.
>
> Feedback welcome.
>
>
> Good luck,
>  Stefan
>
>
>
>
>   





Re: NPE When using a merged segment

Posted by Dominik Friedrich <do...@wipe-records.org>.
I have the same problem with a merged segment. I had a look with luke at 
the index and it seems that the indexer puts the old segment names in 
there instead of the name of the merged segment. I'm not sure if I did 
something wrong or if this is a bug.

Dominik

Gal Nitzan schrieb:
> Hi,
>
> I have built a new index based on the new segment only.
>
>
>
> -----Original Message-----
> From: Stefan Neufeind [mailto:apache.org@stefan-neufeind.de] 
> Sent: Monday, May 29, 2006 10:03 AM
> To: nutch-dev@lucene.apache.org
> Subject: Re: NPE When using a merged segment
>
> Gal Nitzan wrote:
>   
>> Hi,
>>
>> After using mergesegs to merge all my segments to one segment only, I
>>     
> moved
>   
>> the new segment to segments.
>>
>> When accessing the web UI I get:
>>
>> java.lang.RuntimeException: java.lang.NullPointerException
>> 	
>>
>>     
> org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.java:20
>   
>> 3)
>> 	org.apache.nutch.searcher.NutchBean.getSummary(NutchBean.java:329)
>> 	org.apache.jsp.search_jsp._jspService(org.apache.jsp.search_jsp:175)
>>     
>
> Hi,
>
> I'm not sure - but have you tried reindexing that new segment? To my
> understanding the index holds refereences to the segment (segment-name)
> - and in your case those are invalid. This would also explain the error
> you get (in call to getSummary) because the summary is fetched from the
> segment.
>
> If this works, then maybe you'll need to find a better way of cleaning
> up the index - not reindexing everything but maybe just rewriting the
> segmeent-names all into one or so.
>
> Feedback welcome.
>
>
> Good luck,
>  Stefan
>
>
>
>
>   



RE: NPE When using a merged segment

Posted by Gal Nitzan <gn...@usa.net>.
Hi,

I have built a new index based on the new segment only.



-----Original Message-----
From: Stefan Neufeind [mailto:apache.org@stefan-neufeind.de] 
Sent: Monday, May 29, 2006 10:03 AM
To: nutch-dev@lucene.apache.org
Subject: Re: NPE When using a merged segment

Gal Nitzan wrote:
> Hi,
> 
> After using mergesegs to merge all my segments to one segment only, I
moved
> the new segment to segments.
> 
> When accessing the web UI I get:
> 
> java.lang.RuntimeException: java.lang.NullPointerException
> 	
>
org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.java:20
> 3)
> 	org.apache.nutch.searcher.NutchBean.getSummary(NutchBean.java:329)
> 	org.apache.jsp.search_jsp._jspService(org.apache.jsp.search_jsp:175)

Hi,

I'm not sure - but have you tried reindexing that new segment? To my
understanding the index holds refereences to the segment (segment-name)
- and in your case those are invalid. This would also explain the error
you get (in call to getSummary) because the summary is fetched from the
segment.

If this works, then maybe you'll need to find a better way of cleaning
up the index - not reindexing everything but maybe just rewriting the
segmeent-names all into one or so.

Feedback welcome.


Good luck,
 Stefan



Re: NPE When using a merged segment

Posted by Stefan Neufeind <ap...@stefan-neufeind.de>.
Gal Nitzan wrote:
> Hi,
> 
> After using mergesegs to merge all my segments to one segment only, I moved
> the new segment to segments.
> 
> When accessing the web UI I get:
> 
> java.lang.RuntimeException: java.lang.NullPointerException
> 	
> org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.java:20
> 3)
> 	org.apache.nutch.searcher.NutchBean.getSummary(NutchBean.java:329)
> 	org.apache.jsp.search_jsp._jspService(org.apache.jsp.search_jsp:175)

Hi,

I'm not sure - but have you tried reindexing that new segment? To my
understanding the index holds refereences to the segment (segment-name)
- and in your case those are invalid. This would also explain the error
you get (in call to getSummary) because the summary is fetched from the
segment.

If this works, then maybe you'll need to find a better way of cleaning
up the index - not reindexing everything but maybe just rewriting the
segmeent-names all into one or so.

Feedback welcome.


Good luck,
 Stefan

NPE When using a merged segment

Posted by Gal Nitzan <gn...@usa.net>.
Hi,

After using mergesegs to merge all my segments to one segment only, I moved
the new segment to segments.

When accessing the web UI I get:

java.lang.RuntimeException: java.lang.NullPointerException
	
org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.java:20
3)
	org.apache.nutch.searcher.NutchBean.getSummary(NutchBean.java:329)
	org.apache.jsp.search_jsp._jspService(org.apache.jsp.search_jsp:175)

Gal.



Re: Where exactly nutch scoring takes place ?

Posted by Andrzej Bialecki <ab...@getopt.org>.
Gal Nitzan wrote:
> Hi,
>
> The scoring in Nutch-08 is done in a plugin: scoring-opic. It is called from
> Indexr.java
>   

... plus in 6 other places ...

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



RE: Where exactly nutch scoring takes place ?

Posted by Gal Nitzan <gn...@usa.net>.
Hi,

The scoring in Nutch-08 is done in a plugin: scoring-opic. It is called from
Indexr.java

HTH



-----Original Message-----
From: ahmed ghouzia [mailto:ghouzia@yahoo.com] 
Sent: Friday, May 26, 2006 3:16 PM
To: nutch-user@lucene.apache.org; nutch-dev@incubator.apache.org
Subject: Where exactly nutch scoring takes place ?

I want to use nutch as an environment to test my proposed algorithm for web
mining

1- Where exactly does the nutch score take place ? in which packages or
files?

2- Can the LinkAnalysisTool be run at the intranet level?, some documents
mentioned that it can take place only at the whole web crawling level

3- what technologies and concepts that i must be familiar with to get into
nuch development?
is it only jsp, servlet ro anything else ?

		
---------------------------------
Be a chatter box. Enjoy free PC-to-PC calls  with Yahoo! Messenger with
Voice.