You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "henri.gouraud@laposte.net" <he...@laposte.net> on 2012/03/29 16:42:01 UTC

UTF-8 encoding

I cant get utf-8 encoding to work!!

I have        <str name="v.contentType">text/html;charset=UTF-8</str>

in my request handler, and 
input.encoding=UTF-8
output.encoding=UTF-8
in velocity.properties, in various locations (I may have the wrong ones! at
least in the folder where the .vm files reside)

What else should I be doing/configuring.

Thanks
Henri

--
View this message in context: http://lucene.472066.n3.nabble.com/UTF-8-encoding-tp3867885p3867885.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: UTF-8 encoding

Posted by "henri.gouraud@laposte.net" <he...@laposte.net>.
Paul,

velocity.properties are set.
One thing I am not 100% sure about is where this file should reside?
I have placed in in the example/solr/conf/velocity folder (where the .vm
files reside).

Cheers,
Henri


--
View this message in context: http://lucene.472066.n3.nabble.com/UTF-8-encoding-tp3867885p3870398.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: UTF-8 encoding

Posted by Paul Libbrecht <pa...@hoplahup.net>.
Henri,

look velocity.properties. I have there:
> input.encoding          = UTF-8

Do you also?

This is the vm files' encodings.
Of course also make sure you edit these files in UTF-8 (using jEdit made it trustable to me).

paul


Le 30 mars 2012 à 08:49, henri.gouraud@laposte.net a écrit :

> OK, Ill try to provide more details:
> I am using solr-3.5.0
> I am running the example provided in the package.
> Some of the modifications I have done in the various velocity/*.vm files
> have accents!
> It is those accents that show up garbled when I look at the results.
> The .vm files are utf-8 encoded.
> Solr behaves correctly, and treats the utf-8 characters ok.
> My browser is utf-8 ready, and displays correctly results returned by solr
> 
> Cheers,
> Henri
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/UTF-8-encoding-tp3867885p3870081.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: UTF-8 encoding

Posted by "henri.gouraud@laposte.net" <he...@laposte.net>.
OK, Ill try to provide more details:
I am using solr-3.5.0
I am running the example provided in the package.
Some of the modifications I have done in the various velocity/*.vm files
have accents!
It is those accents that show up garbled when I look at the results.
The .vm files are utf-8 encoded.
Solr behaves correctly, and treats the utf-8 characters ok.
My browser is utf-8 ready, and displays correctly results returned by solr

Cheers,
Henri



--
View this message in context: http://lucene.472066.n3.nabble.com/UTF-8-encoding-tp3867885p3870081.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: UTF-8 encoding

Posted by Erick Erickson <er...@gmail.com>.
I doubt that the pre-installed Jetty server has problems with UTF-8, although
you haven't told us what version of Solr you're running on so it could be really
old.

And you also haven't told us why you think UTF-8 is a problem. How is this
manifesting itself? Failed searches? Failed indexing? ???

Because there's some possibility that, if your problem is with
searching from the browser, that your _browser_ isn't
configured to handle UTF-8 for instance.

Best
Erick

On Thu, Mar 29, 2012 at 12:17 PM, henri.gouraud@laposte.net
<he...@laposte.net> wrote:
> Thanks for the tips, but unfortunately, no progress so far.
> Reading through the Web, I guess that jetty has utf-8 problems!
> I guess that I will have to switch from the embedded (and pre installed ->
> easy) jetty server present in Solr in favor of Tomcat (for which I have to
> rediscover the installation issues!).
>
> Cheers,
>
> Henri
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/UTF-8-encoding-tp3867885p3868198.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: UTF-8 encoding

Posted by "henri.gouraud@laposte.net" <he...@laposte.net>.
Thanks for the tips, but unfortunately, no progress so far.
Reading through the Web, I guess that jetty has utf-8 problems!
I guess that I will have to switch from the embedded (and pre installed ->
easy) jetty server present in Solr in favor of Tomcat (for which I have to
rediscover the installation issues!).

Cheers,

Henri

--
View this message in context: http://lucene.472066.n3.nabble.com/UTF-8-encoding-tp3867885p3868198.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: UTF-8 encoding

Posted by Erik Hatcher <er...@gmail.com>.
Apologies for not replying sooner on this thread, I just noticed it today...

To add insight into where velocity.properties can reside, it is used this way in VelocityResponseWriter.java:

    SolrVelocityResourceLoader resourceLoader =
        new SolrVelocityResourceLoader(request.getCore().getSolrConfig().getResourceLoader());
    String propFile = request.getParams().get("v.properties");
          is = resourceLoader.getResourceStream(propFile);
          Properties props = new Properties();
          props.load(is);
          engine.init(props);

SolrVelocityResourceLoader is a pass-through hook to Solr's ResourceLoader, allowing it to load resources from:

  /** Opens any resource by its name.
   * By default, this will look in multiple locations to load the resource:
   * $configDir/$resource (if resource is not absolute)
   * $CWD/$resource
   * otherwise, it will look for it in any jar accessible through the class loader.
   * Override this method to customize loading resources.
   *@return the stream for the named resource
   */
  public InputStream openResource(String resource)

So that file could conceivably live in many places, but conf/ is where I'd put it.

I've just updated the wiki documentation to say this instead:

v.properties: specifies a Velocity properties file to be applied, found using the Solr resource loader mechanism. If not specified, no .properties file is loaded. Example: v.properties=velocity.properties where velocity.properties can be found using Solr's resource loader mechanism, for example in the conf/ directory (not conf/velocity which is for templates only). The .properties file could also be located inside a JAR in the lib/ directory, or other locations.

Feel free to modify that if it needs improving.

Thanks,
	Erik


On Apr 4, 2012, at 04:29 , henri wrote:

> I have finally solved my problem!!
> 
> Did the following:
> 
> added two lines in the /browse requestHandler
>       <str name="v.properties">velocity.properties</str>
>       <str name="v.contentType">text/html;charset=UTF-8</str>
> 
> Moved velocity.properties from solr/conf/velocity to solr/conf
> 
> Not being an expert, I am not 100% sure this is the "best" solution, and
> where and how it should be documented in the solr/velocity package. I will
> leave this doc update to afficionados.
> 
> Cheers to all,
> Henri
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/UTF-8-encoding-tp3867885p3883485.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: UTF-8 encoding

Posted by henri <he...@laposte.net>.
I have finally solved my problem!!

Did the following:

added two lines in the /browse requestHandler
       <str name="v.properties">velocity.properties</str>
       <str name="v.contentType">text/html;charset=UTF-8</str>

Moved velocity.properties from solr/conf/velocity to solr/conf

Not being an expert, I am not 100% sure this is the "best" solution, and
where and how it should be documented in the solr/velocity package. I will
leave this doc update to afficionados.

Cheers to all,
Henri

--
View this message in context: http://lucene.472066.n3.nabble.com/UTF-8-encoding-tp3867885p3883485.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: UTF-8 encoding

Posted by Paul Libbrecht <pa...@hoplahup.net>.
Also, in case you use Apache's mod_proxy, be sure to use the nocanon attribute.
(I don't know of an equivalent for mod_rewrite).

In general, I tend also to advise also to change the default encoding of the java running the servlets... but I am sure you've done this.

Tell us your success or lack thereof, I'm interested and I am sure others are.

paul


Le 29 mars 2012 à 16:49, Bob Sandiford a écrit :

> Hi, Henri.
> 
> Make sure that the container in which you are running Solr is also set for UTF-8.
> 
> For example, in Tomcat, in the server.xml file, your Connector definitions should include:
> 	URIEncoding="UTF-8"
> 
> 
>> -----Original Message-----
>> From: henri.gouraud@laposte.net [mailto:henri.gouraud@laposte.net]
>> Sent: Thursday, March 29, 2012 10:42 AM
>> To: solr-user@lucene.apache.org
>> Subject: UTF-8 encoding
>> 
>> I cant get utf-8 encoding to work!!
>> 
>> I have        <str name="v.contentType">text/html;charset=UTF-8</str>
>> 
>> in my request handler, and
>> input.encoding=UTF-8
>> output.encoding=UTF-8
>> in velocity.properties, in various locations (I may have the wrong ones! at
>> least in the folder where the .vm files reside)
>> 
>> What else should I be doing/configuring.


RE: UTF-8 encoding

Posted by Bob Sandiford <bo...@sirsidynix.com>.
Hi, Henri.

Make sure that the container in which you are running Solr is also set for UTF-8.

For example, in Tomcat, in the server.xml file, your Connector definitions should include:
	URIEncoding="UTF-8"

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | Bob.Sandiford@sirsidynix.com
www.sirsidynix.com

Register for the 2012 COSUGI User Group Conference today for early bird pricing!
May 2-5 at Disney's Coronado Springs Resort - Lake Buena Vista, Florida
 
Join the conversation: Like us on Facebook! Follow us on Twitter!

> -----Original Message-----
> From: henri.gouraud@laposte.net [mailto:henri.gouraud@laposte.net]
> Sent: Thursday, March 29, 2012 10:42 AM
> To: solr-user@lucene.apache.org
> Subject: UTF-8 encoding
> 
> I cant get utf-8 encoding to work!!
> 
> I have        <str name="v.contentType">text/html;charset=UTF-8</str>
> 
> in my request handler, and
> input.encoding=UTF-8
> output.encoding=UTF-8
> in velocity.properties, in various locations (I may have the wrong ones! at
> least in the folder where the .vm files reside)
> 
> What else should I be doing/configuring.
> 
> Thanks
> Henri
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/UTF-8-
> encoding-tp3867885p3867885.html
> Sent from the Solr - User mailing list archive at Nabble.com.