You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Alvaro Cabrerizo <to...@gmail.com> on 2007/02/07 20:03:00 UTC

loading different indexes in tomcat

Hi:

 I´ve got two different indexes: myIndexA and myIndexB. I wanted to load
both on tomcat but without merging. So I've created a new dir myIndexC.
Under myIndexC/indexes I've deployed myIndexA/indexes/part-00000 and
myIndexB/indexes/part-00000 renamed too part-00001. Later I've copied
myIndexA/segments/* and myIndexB/segments/* to myIndexC/segments. I´ve also
moved linkdb/current/part-00000 from myIndexA and myIindexB to myIndexC.

Summarizing, I've moved directories from indexA and B to C. Then I started
tomcat  (searcher.dir points to C ) and searches give me hits from both
indexes B and C (based on what explain.jsp says). So is there any difference
in the result I get (e.g. page ranking ) between this process  and a real
merge using  Nutch built-in commands?

Thanks.

Re: why did nutch0.8.1 fetch empty content from certain sites?

Posted by Jason Culverhouse <ja...@mischievous.org>.

It could b related to http://issues.apache.org/jira/browse/NUTCH-374   
when the property http.content.limit is set to -1
and the data from the server is gzip'ed the content is not decoded  
properly.
Jason

On Feb 8, 2007, at 6:45 AM, wangxu wrote:
> wangxu wrote:
>> when I fetched  some certain sites,
>> I got empty content,contentType,but the fetch status was  
>> "fetch_success" and the metadata was sometimes not empty.
>>
>> how does website configure itself to achieve this?
>> any methods to avoid this situation?
>> I used agent-name:
>> Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; MyIE2; .NET CLR  
>> 1.1.4322)
>>
>>
> sorry,empty content,parsedtext/parseddata

Re: why did nutch0.8.1 fetch empty content from certain sites?

Posted by wangxu <wa...@souchang.com>.

wangxu wrote:
> when I fetched  some certain sites,
> I got empty content,contentType,but the fetch status was 
> "fetch_success" and the metadata was sometimes not empty.
>
> how does website configure itself to achieve this?
> any methods to avoid this situation?
> I used agent-name:
> Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; MyIE2; .NET CLR 
> 1.1.4322)
>
>
sorry,empty content,parsedtext/parseddata

why did nutch0.8.1 fetch empty content from certain sites?

Posted by wangxu <wa...@souchang.com>.

when I fetched  some certain sites,
I got empty content,contentType,but the fetch status was "fetch_success" 
and the metadata was sometimes not empty.

how does website configure itself to achieve this?
any methods to avoid this situation?
I used agent-name:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; MyIE2; .NET CLR 1.1.4322)