You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shalin Shekhar Mangar <sh...@gmail.com> on 2009/08/26 09:10:17 UTC

Re: encoding problem

On Wed, Aug 26, 2009 at 10:24 AM, Bernadette Houghton <
bernadette.houghton@deakin.edu.au> wrote:

> We have an encoding problem with our solr application. That is, non-ASCII
> chars displaying fine in SOLR, but in googledegook in our application .
>
> Our tomcat server.xml file already contains URIencoding="UTF-8" under the
> relevant <connector>.
>
> A google search reveals that I should set the encoding for the JVM, but
> have no idea how to do this. I'm running Windows, and there is no tomcat
> process in my Windows Services.
>

Add the following parameter to the JVM:

-Dfile.encoding=UTF-8

-- 
Regards,
Shalin Shekhar Mangar.

RE: encoding problem

Posted by Bernadette Houghton <be...@deakin.edu.au>.
Finally resolved the problem! The solution was 3-pronged on my windows PC-

Added to my.ini under mysqld-
default-character-set=utf8
collation_server=utf8_unicode_ci
character_set_server=utf8
skip-character-set-client-handshake

Added to JAVA_OPTS environmental variable –
-Dfile.encoding=UTF-8

Added to beginning of tomcat startup.bat (positioning is important!)
set JAVA_OPTS="-Dfile.encoding=UTF-8"  

Thanks to everyone for their much appreciated help!

Bern

-----Original Message-----
From: Bernadette Houghton [mailto:bernadette.houghton@deakin.edu.au] 
Sent: Monday, 31 August 2009 9:18 AM
To: 'solr-user@lucene.apache.org'
Subject: RE: encoding problem

Still having a few issues with encoding, although I've been able to resolve the particular issue below by just re-editing the affected record. 

The other encoding issue is with Greek characters. With solr turned off in our user-facing application, greek characters e.g. α,ω (small alpha, small omega) display correctly. But with solr turned on, garbage displays instead. If we enter the characters as decimal (e.g. &#969;), all displays OK with or without solr. Does this suggest anything to anyone??

TIA
bern

RE: encoding problem

Posted by Bernadette Houghton <be...@deakin.edu.au>.
Still having a few issues with encoding, although I've been able to resolve the particular issue below by just re-editing the affected record. 

The other encoding issue is with Greek characters. With solr turned off in our user-facing application, greek characters e.g. α,ω (small alpha, small omega) display correctly. But with solr turned on, garbage displays instead. If we enter the characters as decimal (e.g. &#969;), all displays OK with or without solr. Does this suggest anything to anyone??

TIA
bern

-----Original Message-----
From: Bernadette Houghton [mailto:bernadette.houghton@deakin.edu.au] 
Sent: Friday, 28 August 2009 9:31 AM
To: 'solr-user@lucene.apache.org'; 'yonik@lucidimagination.com'
Subject: RE: encoding problem

Shalin, the XML from solr admin for the relevant field is displaying as -

<str name="citation_t"><a title="Browse by Author Name for Moncrieff, Joan" href="/fez/list/author/Moncrieff%2C+Joan/">Moncrieff, Joan</a>, <a title="Browse by Author Name for Macauley, Peter" href="/fez/list/author/Macauley%2C+Peter/">Macauley, Peter</a> and <a title="Browse by Author Name for Epps, Janine" href="/fez/list/author/Epps%2C+Janine/">Epps, Janine</a> <a title="Browse by Year 2006" href="/fez/list/year/2006/">2006</a>, <a title="Click to view Journal, Media Article: &ldquo;My Universe is Here&rdquo;: Implications For the Future of Academic Libraries From the Results of a Survey of Researchers" href="/fez/view/changeme:156">“My Universe is Here�: Implications For the Future of Academic Libraries From the Results of a Survey of Researchers</a><i></i>, vol. 38, no. 2, pp. 71-83.</str>


The weird thing is that the title displays OK in one place, but not in the "href" bit.

bern

RE: encoding problem

Posted by Bernadette Houghton <be...@deakin.edu.au>.
Shalin, the XML from solr admin for the relevant field is displaying as -

<str name="citation_t"><a title="Browse by Author Name for Moncrieff, Joan" href="/fez/list/author/Moncrieff%2C+Joan/">Moncrieff, Joan</a>, <a title="Browse by Author Name for Macauley, Peter" href="/fez/list/author/Macauley%2C+Peter/">Macauley, Peter</a> and <a title="Browse by Author Name for Epps, Janine" href="/fez/list/author/Epps%2C+Janine/">Epps, Janine</a> <a title="Browse by Year 2006" href="/fez/list/year/2006/">2006</a>, <a title="Click to view Journal, Media Article: &ldquo;My Universe is Here&rdquo;: Implications For the Future of Academic Libraries From the Results of a Survey of Researchers" href="/fez/view/changeme:156">“My Universe is Here�: Implications For the Future of Academic Libraries From the Results of a Survey of Researchers</a><i></i>, vol. 38, no. 2, pp. 71-83.</str>


The weird thing is that the title displays OK in one place, but not in the "href" bit.

bern

Re: encoding problem

Posted by Yonik Seeley <yo...@lucidimagination.com>.
Have you determined if the problem is on the indexing side or the
query side?  I don't see any reason you should have to set/change any
encoding in the JVM.

-Yonik
http://www.lucidimagination.com



On Thu, Aug 27, 2009 at 7:03 PM, Bernadette
Houghton<be...@deakin.edu.au> wrote:
> Hi Shalin, strangely, things still aren't working. I've set the JAVA_OPTS through either the GUI or to startup.bat, but absolutely no impact. Have tried reindexing also, but still no impact - results such as -
>
> “My Universe is Here�
>
> bern
>
> -----Original Message-----
> From: Shalin Shekhar Mangar [mailto:shalinmangar@gmail.com]
> Sent: Wednesday, 26 August 2009 5:50 PM
> To: solr-user@lucene.apache.org
> Subject: Re: encoding problem
>
> On Wed, Aug 26, 2009 at 12:52 PM, Bernadette Houghton <
> bernadette.houghton@deakin.edu.au> wrote:
>
>> Thanks for your quick reply, Shalin.
>>
>> Tomcat is running on my Windows machine, but does not appear in Windows
>> Services (as I was expecting it should ... am I wrong?). I'm running it from
>> a startup.bat on my desktop - see below. Do I add the Dfile line to the
>> startup.bat?
>>
>> SOLR is part of the repository software that we are running.
>>
>
> Tomcat respects an environment variable called JAVA_OPTS through which you
> can pass any jvm argument (e.g. heap size, file encoding). Set
> JAVA_OPTS="-Dfile.encoding=UTF-8" either through the GUI or by adding the
> following to startup.bat:
>
> set JAVA_OPTS="-Dfile.encoding=UTF-8"
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

RE: encoding problem

Posted by Bernadette Houghton <be...@deakin.edu.au>.
Hi Shalin, strangely, things still aren't working. I've set the JAVA_OPTS through either the GUI or to startup.bat, but absolutely no impact. Have tried reindexing also, but still no impact - results such as -

“My Universe is Here�

bern

-----Original Message-----
From: Shalin Shekhar Mangar [mailto:shalinmangar@gmail.com] 
Sent: Wednesday, 26 August 2009 5:50 PM
To: solr-user@lucene.apache.org
Subject: Re: encoding problem

On Wed, Aug 26, 2009 at 12:52 PM, Bernadette Houghton <
bernadette.houghton@deakin.edu.au> wrote:

> Thanks for your quick reply, Shalin.
>
> Tomcat is running on my Windows machine, but does not appear in Windows
> Services (as I was expecting it should ... am I wrong?). I'm running it from
> a startup.bat on my desktop - see below. Do I add the Dfile line to the
> startup.bat?
>
> SOLR is part of the repository software that we are running.
>

Tomcat respects an environment variable called JAVA_OPTS through which you
can pass any jvm argument (e.g. heap size, file encoding). Set
JAVA_OPTS="-Dfile.encoding=UTF-8" either through the GUI or by adding the
following to startup.bat:

set JAVA_OPTS="-Dfile.encoding=UTF-8"

-- 
Regards,
Shalin Shekhar Mangar.

Re: encoding problem

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Wed, Aug 26, 2009 at 12:52 PM, Bernadette Houghton <
bernadette.houghton@deakin.edu.au> wrote:

> Thanks for your quick reply, Shalin.
>
> Tomcat is running on my Windows machine, but does not appear in Windows
> Services (as I was expecting it should ... am I wrong?). I'm running it from
> a startup.bat on my desktop - see below. Do I add the Dfile line to the
> startup.bat?
>
> SOLR is part of the repository software that we are running.
>

Tomcat respects an environment variable called JAVA_OPTS through which you
can pass any jvm argument (e.g. heap size, file encoding). Set
JAVA_OPTS="-Dfile.encoding=UTF-8" either through the GUI or by adding the
following to startup.bat:

set JAVA_OPTS="-Dfile.encoding=UTF-8"

-- 
Regards,
Shalin Shekhar Mangar.

RE: encoding problem

Posted by Bernadette Houghton <be...@deakin.edu.au>.
Thanks for your quick reply, Shalin.

Tomcat is running on my Windows machine, but does not appear in Windows Services (as I was expecting it should ... am I wrong?). I'm running it from a startup.bat on my desktop - see below. Do I add the Dfile line to the startup.bat?

SOLR is part of the repository software that we are running.

Thanks!

BERN

Startup.bat -
@echo off
if "%OS%" == "Windows_NT" setlocal
rem ---------------------------------------------------------------------------
rem Start script for the CATALINA Server
rem
rem $Id: startup.bat 302918 2004-05-27 18:25:11Z yoavs $
rem ---------------------------------------------------------------------------

rem Guess CATALINA_HOME if not defined
set CURRENT_DIR=%cd%
if not "%CATALINA_HOME%" == "" goto gotHome
set CATALINA_HOME=%CURRENT_DIR%
if exist "%CATALINA_HOME%\bin\catalina.bat" goto okHome
cd ..
set CATALINA_HOME=%cd%
cd %CURRENT_DIR%
:gotHome
if exist "%CATALINA_HOME%\bin\catalina.bat" goto okHome
echo The CATALINA_HOME environment variable is not defined correctly
echo This environment variable is needed to run this program
goto end
:okHome

set EXECUTABLE=%CATALINA_HOME%\bin\catalina.bat

rem Check that target executable exists
if exist "%EXECUTABLE%" goto okExec
echo Cannot find %EXECUTABLE%
echo This file is needed to run this program
goto end
:okExec

rem Get remaining unshifted command line arguments and save them in the
set CMD_LINE_ARGS=
:setArgs
if ""%1""=="""" goto doneSetArgs
set CMD_LINE_ARGS=%CMD_LINE_ARGS% %1
shift
goto setArgs
:doneSetArgs

call "%EXECUTABLE%" start %CMD_LINE_ARGS%

:end




Re: encoding problem

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Wed, Aug 26, 2009 at 12:42 PM, Bernadette Houghton <
bernadette.houghton@deakin.edu.au> wrote:

> Hi Shalin, stupid question - I'm an apache/solr newbie - but how do I
> access the JVM???
>

When you execute the java executable, just add -Dfile.encoding=UTF-8 as a
command line argument to the executable.

How are you consuming Solr? You mentioned there is no tomcat, is your solr
client a desktop java application?

-- 
Regards,
Shalin Shekhar Mangar.

RE: encoding problem

Posted by Bernadette Houghton <be...@deakin.edu.au>.
Hi Shalin, stupid question - I'm an apache/solr newbie - but how do I access the JVM???

Regards
Bern


-----Original Message-----
From: Shalin Shekhar Mangar [mailto:shalinmangar@gmail.com] 
Sent: Wednesday, 26 August 2009 5:10 PM
To: solr-user@lucene.apache.org
Subject: Re: encoding problem

On Wed, Aug 26, 2009 at 10:24 AM, Bernadette Houghton <
bernadette.houghton@deakin.edu.au> wrote:

> We have an encoding problem with our solr application. That is, non-ASCII
> chars displaying fine in SOLR, but in googledegook in our application .
>
> Our tomcat server.xml file already contains URIencoding="UTF-8" under the
> relevant <connector>.
>
> A google search reveals that I should set the encoding for the JVM, but
> have no idea how to do this. I'm running Windows, and there is no tomcat
> process in my Windows Services.
>

Add the following parameter to the JVM:

-Dfile.encoding=UTF-8

-- 
Regards,
Shalin Shekhar Mangar.