You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@roller.apache.org by BigLiu <bi...@gmail.com> on 2006/02/21 03:05:35 UTC

register and search in Chinese

I just installed latest and greatest Roller 2.1_tagging. I am tring to use
this application for Chinese blogs. But currently the registration does not
allow username to be in Chinese. And search in Chinese (simplified) also
does not work.

Does anybody know how to solve these two problems? 

Also if I want to customize the code to support these two features, how hard
will it be and any tips?

Really appreciate. 
--
View this message in context: http://www.nabble.com/register-and-search-in-Chinese-t1159215c12275.html#a3042728
Sent from the Roller - User forum at Nabble.com.


Re: register and search in Chinese

Posted by Allen Gilliland <Al...@Sun.COM>.
The default allowed characters for a username is currently set to
"A-Za-z0-9", so that's why you are having trouble putting chinese
characters into your username.  The same restriction applies for weblog
handles.

I'm not sure exactly what the best way is to remove that restriction and
open up usernames and weblog handles to foreign characters.  I think the
restriction is mainly there to prevent usernames and weblog handles from
having bad characters like "+/\]}[{", but it would be nice to allow for
foreign characters.

If you want to change these restrictions you can do so in the
org.roller.presentation.website.actions.UserBaseAction and
org.roller.presentation.website.actions.CreateWebsiteAction.

-- Allen




On Mon, 2006-02-20 at 18:05, BigLiu wrote:
> I just installed latest and greatest Roller 2.1_tagging. I am tring to use
> this application for Chinese blogs. But currently the registration does not
> allow username to be in Chinese. And search in Chinese (simplified) also
> does not work.
> 
> Does anybody know how to solve these two problems? 
> 
> Also if I want to customize the code to support these two features, how hard
> will it be and any tips?
> 
> Really appreciate. 
> --
> View this message in context: http://www.nabble.com/register-and-search-in-Chinese-t1159215c12275.html#a3042728
> Sent from the Roller - User forum at Nabble.com.
> 


Re: register and search in Chinese

Posted by BigLiu <bi...@gmail.com>.
Thanks a lot for both of your replies. I made the code change so now it allow
Chinese username. But site handle still have problem (code can not get site
back based on Chinese handle). But I think use alphanumeric for username and
site handle probably is good enough. Since we currently allow screen name in
Chinese. 

I will try to do more research on search. This is a must have and thanks for
the tip. 

Also tagging for Chinese is not working as well. But compare to search, this
probably is easy fix. 


--
View this message in context: http://www.nabble.com/register-and-search-in-Chinese-t1159215c12275.html#a3044740
Sent from the Roller - User forum at Nabble.com.


Re: register and search in Chinese

Posted by BigLiu <bi...@gmail.com>.
Thanks for the tips. 

I now changed Tomcat connector to add: URIEncoding="UTF-8" . And it works
now without my hack. 


--
View this message in context: http://www.nabble.com/register-and-search-in-Chinese-t1159215c12275.html#a3138974
Sent from the Roller - User forum at Nabble.com.


Re: register and search in Chinese

Posted by Anil Gangolli <an...@busybuddha.org>.
We expect URI encoding to be UTF-8 uniformly.  You shouldn't need to do 
any hacks to convert character sets.  You should check the URIEncoding 
attribute in your server's Connector configuration (assuming Tomcat) and 
look at our installation guide.  If you are still seeing problems, 
please file issues on our issue tracker with specific examples that 
fail.  I'll try to look into them, and may need to work with you to 
diagnose them.



BigLiu wrote:
> Just to report that Chinese search works for me now.
>
> I change to use ChineseAnalyzer.
>
> Then some places need convert ISO-8859-1 encoding found in url back to UTF-8
> in the Java code and JSP pages. 
>
> public static String convertISO88591ToUTF8(String str) {
> 		try{
> 		   return new String(str.getBytes("ISO-8859-1"), "utf8" );
> 		}
> 		catch (Exception e) {
> 			return str;
> 		}
> 	}
> --
> View this message in context: http://www.nabble.com/register-and-search-in-Chinese-t1159215c12275.html#a3129092
> Sent from the Roller - User forum at Nabble.com.
>
>
>   


Re: register and search in Chinese

Posted by BigLiu <bi...@gmail.com>.
Just to report that Chinese search works for me now.

I change to use ChineseAnalyzer.

Then some places need convert ISO-8859-1 encoding found in url back to UTF-8
in the Java code and JSP pages. 

public static String convertISO88591ToUTF8(String str) {
		try{
		   return new String(str.getBytes("ISO-8859-1"), "utf8" );
		}
		catch (Exception e) {
			return str;
		}
	}
--
View this message in context: http://www.nabble.com/register-and-search-in-Chinese-t1159215c12275.html#a3129092
Sent from the Roller - User forum at Nabble.com.


Re: register and search in Chinese

Posted by BigLiu <bi...@gmail.com>.
Still need more help here.  I changed to use Chinese. But the problem is that
seems somewhere in GUI, the encoding is messed up. 

1. That after you type in Chinese in the front page search box, like "微软" 
(means Microsoft). Result page address bar will show a url:
"http://localhost:8080/roller/sitesearch.do?q=%E5%BE%AE%E8%BD%AF"   and
search box contains weired characters: "微软"  . 

2. In same results page: if I change url to: 
http://localhost:8080/roller/sitesearch.do?q=微软, then hit return. Result url
will change to: http://localhost:8080/roller/sitesearch.do?q=%CE%A2%C8%ED.
Now it seems that same two character is encoded to 4 UTF-8 codes instead
above 6. 


I tried same thing in http://www.jroller.com . First scenario, even though
no rereults returned, the two Chinese character stays in the result search
box instead of weired character.  And if I try second case, it will gave me
a 500 null pointer exception.

I am using Roller_2.1_tagging. Running Tomcat 5.5 on windows. MySQL 5.0 in
UTF-8. JDK is 1.5.06 (I hope it is not JDK compatability issue).

 




--
View this message in context: http://www.nabble.com/register-and-search-in-Chinese-t1159215c12275.html#a3103892
Sent from the Roller - User forum at Nabble.com.


Re: register and search in Chinese

Posted by Ian Kallen <ik...@technorati.com>.
BigLiu wrote:
> And search in Chinese (simplified) also
> does not work.
>
> Does anybody know how to solve these two problems? 
>
> Also if I want to customize the code to support these two features, how hard
> will it be and any tips?
>   
You'll want to incorporate a lucene analyzer that understands Chinese 
text. There's CJK analyzer and one that's specifically for Chinese in 
the lucene sandbox. See 
http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/analyzers/src/java/org/apache/lucene/analysis/cn/ChineseAnalyzer.java

Recommended reading for lucene is "Lucene in Action" by Hatcher and 
Gospodnetic
-Ian

-- 
Ian Kallen || Architect, Technorati Inc. || m: 415.505.5208
http://www.arachna.com/roller/page/spidaman