You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by J B <be...@hotmail.com> on 2005/05/30 19:46:02 UTC
Searching with Ö and Ä?
Hello,
Is there anyone who can help me configure Nutch so that I can use it for
Swedics or German websites containing characters like "ö" and "ä"? Crawling
and indexing seems to work fine, it's just the searching that goes wrong.
When I enter a searchstring like "Köln", knowing that it appears in the
text, the resultpage says that there are no matching results, and the "ö" is
replaced by random characters...
I have searched the docs and the web, but I can't find the answer to my
problem.
Best regards,
Jon
P.S. Sorry if two versions of this message reached the list, I am quite new
to this...
_________________________________________________________________
Chat: Ha en fest på Habbo Hotel
http://habbohotel.msn.se/habbo/sv/channelizer Checka in här!
Re: Searching with Ö and Ä?
Posted by Andrzej Bialecki <ab...@getopt.org>.
J B wrote:
> Hello,
>
> Is there anyone who can help me configure Nutch so that I can use it for
> Swedics or German websites containing characters like "ö" and "ä"?
> Crawling and indexing seems to work fine, it's just the searching that
> goes wrong. When I enter a searchstring like "Köln", knowing that it
> appears in the text, the resultpage says that there are no matching
> results, and the "ö" is replaced by random characters...
>
> I have searched the docs and the web, but I can't find the answer to my
> problem.
The characters are not random - they correspond to a url-encoding of
utf-8 encoding of latin1 characters, whereas they should be a
url-encoding of utf-8 encoding of utf-8 characters.
;-)
For the US-Ascii range each of the above gives the same result, but for
all other characters it gives wrong results.
Please make sure that you set the page encoding to utf-8 in your JSPs,
htmls, and preferably the same as the default character encoding,
somewhere in the configuration of your servlet engine. As the old hands
say: "choose UTF-8 and stick to it religiously".
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
RE: Searching with Ö and Ä?
Posted by Chirag Chaman <de...@filangy.com>.
Jon,
You'll need to set encoding to UTF-8.
We don't use the default Nutch JSP pages, so I'm not sure if they have it or
not, but here's the simplified process.
1. make sure your JSP files have the something like this on top
<%@ page contentType="text/html; charset=utf-8" pageEncoding="utf-8"
2. Your tomcat server.xml should have this line (URIEncoding="UTF-8")
<Connector port="80"
maxThreads="250" minSpareThreads="25" maxSpareThreads="75"
enableLookups="false" redirectPort="8443" acceptCount="100"
connectionTimeout="15000" disableUploadTimeout="180000"
URIEncoding="UTF-8" useBodyEncodingForURI="false" />
This should take care of it.
Regards,
CC
--------------------------------------------
Filangy, Inc.
Interested in Improving Search? Join our Team!
http://filangy.com/jointheteam.jsp
-----Original Message-----
From: J B [mailto:bewalog_33@hotmail.com]
Sent: Monday, May 30, 2005 1:46 PM
To: nutch-user@incubator.apache.org
Subject: Searching with Ö and Ä?
Hello,
Is there anyone who can help me configure Nutch so that I can use it for
Swedics or German websites containing characters like "ö" and "ä"? Crawling
and indexing seems to work fine, it's just the searching that goes wrong.
When I enter a searchstring like "Köln", knowing that it appears in the
text, the resultpage says that there are no matching results, and the "ö" is
replaced by random characters...
I have searched the docs and the web, but I can't find the answer to my
problem.
Best regards,
Jon
P.S. Sorry if two versions of this message reached the list, I am quite new
to this...
_________________________________________________________________
Chat: Ha en fest på Habbo Hotel
http://habbohotel.msn.se/habbo/sv/channelizer Checka in här!