You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Phạm Hải Thanh <ph...@vasc.com.vn> on 2007/06/12 03:29:00 UTC

Cache problem,

Hi all,
I have problem witch cache, after crawling & searching successfully. The cache page is display with square & question marks, plz take a look at
http://192.168.71.66:8080/cached.jsp?idx=0&id=1. I have tried to make some configuration but no lucky. Do you have any idea ???

By the way, anyone know how to turn off caching (not to store cache data).

Thanks all



RE: Cache problem,

Posted by Phạm Hải Thanh <ph...@vasc.com.vn>.
Hi Enzo, hi all
I have fixed it all yesterday, so it looks fine to all ^^
By curtain reason, the cache.jsp can not get charset from hit, so I have forced it

content = new String(bean.getContent(details), "utf-8");

Thanks xiong.xu.cn@gmail.com about this.
Thank u very much, Enzo.


-----Original Message-----
From: Enzo Michelangeli [mailto:enzomich@gmail.com] 
Sent: 13 tháng sáu 2007 6:46 Sáng
To: nutch-user@lucene.apache.org
Subject: Re: Cache problem,

----- Original Message ----- 
From: "Phạm Hải Thanh" <ph...@vasc.com.vn>
Sent: Tuesday, June 12, 2007 10:06 AM

> Oops, I am sorry, here is the link: http:// 
> 203.162.71.66:8080/cached.jsp?idx=0&id=1
>
> I also think this is a an issue of encoding too :(

It looks fine to me, both with Firefox and MSIE 7 (and UTF-8 encoding in 
both cases). Are you sure you configured your browser for automatic 
selection of the encoding?

> About this config
>
> <property>
>  <name>fetcher.store.content</name>
>  <value>false</value>
>  <description>If true, fetcher will store content.</description>
> </property>
>
> I have tried it before, but I'm not sure this turn off the cache because 
> the db, before and after config this, have the same size. I will try it 
> again.

In my experience it reduces the size of the segments by about 60 to 70%. The 
updatedb and linkdb should be unaffected, as they don't hold anything else 
but URL's in first place...

Enzo



Re: Cache problem,

Posted by Enzo Michelangeli <en...@gmail.com>.
----- Original Message ----- 
From: "Phạm Hải Thanh" <ph...@vasc.com.vn>
Sent: Tuesday, June 12, 2007 10:06 AM

> Oops, I am sorry, here is the link: http:// 
> 203.162.71.66:8080/cached.jsp?idx=0&id=1
>
> I also think this is a an issue of encoding too :(

It looks fine to me, both with Firefox and MSIE 7 (and UTF-8 encoding in 
both cases). Are you sure you configured your browser for automatic 
selection of the encoding?

> About this config
>
> <property>
>  <name>fetcher.store.content</name>
>  <value>false</value>
>  <description>If true, fetcher will store content.</description>
> </property>
>
> I have tried it before, but I'm not sure this turn off the cache because 
> the db, before and after config this, have the same size. I will try it 
> again.

In my experience it reduces the size of the segments by about 60 to 70%. The 
updatedb and linkdb should be unaffected, as they don't hold anything else 
but URL's in first place...

Enzo


RE: Cache problem,

Posted by Phạm Hải Thanh <ph...@vasc.com.vn>.
Oops, I am sorry, here is the link: http:// 203.162.71.66:8080/cached.jsp?idx=0&id=1

I also think this is a an issue of encoding too :(

About this config

<property>
  <name>fetcher.store.content</name>
  <value>false</value>
  <description>If true, fetcher will store content.</description>
</property>

I have tried it before, but I'm not sure this turn off the cache because the db, before and after config this, have the same size. I will try it again.


-----Original Message-----
From: Enzo Michelangeli [mailto:enzomich@gmail.com] 
Sent: 12 tháng sáu 2007 8:57 Sáng
To: nutch-user@lucene.apache.org
Subject: Re: Cache problem,

----- Original Message ----- 
From: "Phạm Hải Thanh" <ph...@vasc.com.vn>
Sent: Tuesday, June 12, 2007 9:29 AM

> Hi all,
> I have problem witch cache, after crawling & searching successfully. The 
> cache page is display with square & question marks, plz take a look at
> http://192.168.71.66:8080/cached.jsp?idx=0&id=1. I have tried to make some 
> configuration but no lucky. Do you have any idea ???

Your IP address is non-routable (i.e., valid only on your LAN) and can't be 
accessed by the rest of us from the Internet. But I suspect it's an issue of 
encoding.

> By the way, anyone know how to turn off caching (not to store cache data).

Place in conf/nutch-site.xml :

<property>
  <name>fetcher.store.content</name>
  <value>false</value>
  <description>If true, fetcher will store content.</description>
</property>

Enzo



Re: Cache problem,

Posted by Enzo Michelangeli <en...@gmail.com>.
----- Original Message ----- 
From: "Phạm Hải Thanh" <ph...@vasc.com.vn>
Sent: Tuesday, June 12, 2007 9:29 AM

> Hi all,
> I have problem witch cache, after crawling & searching successfully. The 
> cache page is display with square & question marks, plz take a look at
> http://192.168.71.66:8080/cached.jsp?idx=0&id=1. I have tried to make some 
> configuration but no lucky. Do you have any idea ???

Your IP address is non-routable (i.e., valid only on your LAN) and can't be 
accessed by the rest of us from the Internet. But I suspect it's an issue of 
encoding.

> By the way, anyone know how to turn off caching (not to store cache data).

Place in conf/nutch-site.xml :

<property>
  <name>fetcher.store.content</name>
  <value>false</value>
  <description>If true, fetcher will store content.</description>
</property>

Enzo