You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shalin Shekhar Mangar <sh...@gmail.com> on 2009/08/26 09:10:17 UTC
Re: encoding problem
On Wed, Aug 26, 2009 at 10:24 AM, Bernadette Houghton <
bernadette.houghton@deakin.edu.au> wrote:
> We have an encoding problem with our solr application. That is, non-ASCII
> chars displaying fine in SOLR, but in googledegook in our application .
>
> Our tomcat server.xml file already contains URIencoding="UTF-8" under the
> relevant <connector>.
>
> A google search reveals that I should set the encoding for the JVM, but
> have no idea how to do this. I'm running Windows, and there is no tomcat
> process in my Windows Services.
>
Add the following parameter to the JVM:
-Dfile.encoding=UTF-8
--
Regards,
Shalin Shekhar Mangar.
RE: encoding problem
Posted by Bernadette Houghton <be...@deakin.edu.au>.
Finally resolved the problem! The solution was 3-pronged on my windows PC-
Added to my.ini under mysqld-
default-character-set=utf8
collation_server=utf8_unicode_ci
character_set_server=utf8
skip-character-set-client-handshake
Added to JAVA_OPTS environmental variable –
-Dfile.encoding=UTF-8
Added to beginning of tomcat startup.bat (positioning is important!)
set JAVA_OPTS="-Dfile.encoding=UTF-8"
Thanks to everyone for their much appreciated help!
Bern
-----Original Message-----
From: Bernadette Houghton [mailto:bernadette.houghton@deakin.edu.au]
Sent: Monday, 31 August 2009 9:18 AM
To: 'solr-user@lucene.apache.org'
Subject: RE: encoding problem
Still having a few issues with encoding, although I've been able to resolve the particular issue below by just re-editing the affected record.
The other encoding issue is with Greek characters. With solr turned off in our user-facing application, greek characters e.g. α,ω (small alpha, small omega) display correctly. But with solr turned on, garbage displays instead. If we enter the characters as decimal (e.g. ω), all displays OK with or without solr. Does this suggest anything to anyone??
TIA
bern
RE: encoding problem
Posted by Bernadette Houghton <be...@deakin.edu.au>.
Still having a few issues with encoding, although I've been able to resolve the particular issue below by just re-editing the affected record.
The other encoding issue is with Greek characters. With solr turned off in our user-facing application, greek characters e.g. α,ω (small alpha, small omega) display correctly. But with solr turned on, garbage displays instead. If we enter the characters as decimal (e.g. ω), all displays OK with or without solr. Does this suggest anything to anyone??
TIA
bern
-----Original Message-----
From: Bernadette Houghton [mailto:bernadette.houghton@deakin.edu.au]
Sent: Friday, 28 August 2009 9:31 AM
To: 'solr-user@lucene.apache.org'; 'yonik@lucidimagination.com'
Subject: RE: encoding problem
Shalin, the XML from solr admin for the relevant field is displaying as -
<str name="citation_t"><a title="Browse by Author Name for Moncrieff, Joan" href="/fez/list/author/Moncrieff%2C+Joan/">Moncrieff, Joan</a>, <a title="Browse by Author Name for Macauley, Peter" href="/fez/list/author/Macauley%2C+Peter/">Macauley, Peter</a> and <a title="Browse by Author Name for Epps, Janine" href="/fez/list/author/Epps%2C+Janine/">Epps, Janine</a> <a title="Browse by Year 2006" href="/fez/list/year/2006/">2006</a>, <a title="Click to view Journal, Media Article: “My Universe is Here”: Implications For the Future of Academic Libraries From the Results of a Survey of Researchers" href="/fez/view/changeme:156">“My Universe is Hereâ€Â�: Implications For the Future of Academic Libraries From the Results of a Survey of Researchers</a><i></i>, vol. 38, no. 2, pp. 71-83.</str>
The weird thing is that the title displays OK in one place, but not in the "href" bit.
bern
RE: encoding problem
Posted by Bernadette Houghton <be...@deakin.edu.au>.
Shalin, the XML from solr admin for the relevant field is displaying as -
<str name="citation_t"><a title="Browse by Author Name for Moncrieff, Joan" href="/fez/list/author/Moncrieff%2C+Joan/">Moncrieff, Joan</a>, <a title="Browse by Author Name for Macauley, Peter" href="/fez/list/author/Macauley%2C+Peter/">Macauley, Peter</a> and <a title="Browse by Author Name for Epps, Janine" href="/fez/list/author/Epps%2C+Janine/">Epps, Janine</a> <a title="Browse by Year 2006" href="/fez/list/year/2006/">2006</a>, <a title="Click to view Journal, Media Article: “My Universe is Here”: Implications For the Future of Academic Libraries From the Results of a Survey of Researchers" href="/fez/view/changeme:156">“My Universe is Hereâ€Â�: Implications For the Future of Academic Libraries From the Results of a Survey of Researchers</a><i></i>, vol. 38, no. 2, pp. 71-83.</str>
The weird thing is that the title displays OK in one place, but not in the "href" bit.
bern
Re: encoding problem
Posted by Yonik Seeley <yo...@lucidimagination.com>.
Have you determined if the problem is on the indexing side or the
query side? I don't see any reason you should have to set/change any
encoding in the JVM.
-Yonik
http://www.lucidimagination.com
On Thu, Aug 27, 2009 at 7:03 PM, Bernadette
Houghton<be...@deakin.edu.au> wrote:
> Hi Shalin, strangely, things still aren't working. I've set the JAVA_OPTS through either the GUI or to startup.bat, but absolutely no impact. Have tried reindexing also, but still no impact - results such as -
>
> “My Universe is Here�
>
> bern
>
> -----Original Message-----
> From: Shalin Shekhar Mangar [mailto:shalinmangar@gmail.com]
> Sent: Wednesday, 26 August 2009 5:50 PM
> To: solr-user@lucene.apache.org
> Subject: Re: encoding problem
>
> On Wed, Aug 26, 2009 at 12:52 PM, Bernadette Houghton <
> bernadette.houghton@deakin.edu.au> wrote:
>
>> Thanks for your quick reply, Shalin.
>>
>> Tomcat is running on my Windows machine, but does not appear in Windows
>> Services (as I was expecting it should ... am I wrong?). I'm running it from
>> a startup.bat on my desktop - see below. Do I add the Dfile line to the
>> startup.bat?
>>
>> SOLR is part of the repository software that we are running.
>>
>
> Tomcat respects an environment variable called JAVA_OPTS through which you
> can pass any jvm argument (e.g. heap size, file encoding). Set
> JAVA_OPTS="-Dfile.encoding=UTF-8" either through the GUI or by adding the
> following to startup.bat:
>
> set JAVA_OPTS="-Dfile.encoding=UTF-8"
>
> --
> Regards,
> Shalin Shekhar Mangar.
>
RE: encoding problem
Posted by Bernadette Houghton <be...@deakin.edu.au>.
Hi Shalin, strangely, things still aren't working. I've set the JAVA_OPTS through either the GUI or to startup.bat, but absolutely no impact. Have tried reindexing also, but still no impact - results such as -
“My Universe is Here�
bern
-----Original Message-----
From: Shalin Shekhar Mangar [mailto:shalinmangar@gmail.com]
Sent: Wednesday, 26 August 2009 5:50 PM
To: solr-user@lucene.apache.org
Subject: Re: encoding problem
On Wed, Aug 26, 2009 at 12:52 PM, Bernadette Houghton <
bernadette.houghton@deakin.edu.au> wrote:
> Thanks for your quick reply, Shalin.
>
> Tomcat is running on my Windows machine, but does not appear in Windows
> Services (as I was expecting it should ... am I wrong?). I'm running it from
> a startup.bat on my desktop - see below. Do I add the Dfile line to the
> startup.bat?
>
> SOLR is part of the repository software that we are running.
>
Tomcat respects an environment variable called JAVA_OPTS through which you
can pass any jvm argument (e.g. heap size, file encoding). Set
JAVA_OPTS="-Dfile.encoding=UTF-8" either through the GUI or by adding the
following to startup.bat:
set JAVA_OPTS="-Dfile.encoding=UTF-8"
--
Regards,
Shalin Shekhar Mangar.
Re: encoding problem
Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Wed, Aug 26, 2009 at 12:52 PM, Bernadette Houghton <
bernadette.houghton@deakin.edu.au> wrote:
> Thanks for your quick reply, Shalin.
>
> Tomcat is running on my Windows machine, but does not appear in Windows
> Services (as I was expecting it should ... am I wrong?). I'm running it from
> a startup.bat on my desktop - see below. Do I add the Dfile line to the
> startup.bat?
>
> SOLR is part of the repository software that we are running.
>
Tomcat respects an environment variable called JAVA_OPTS through which you
can pass any jvm argument (e.g. heap size, file encoding). Set
JAVA_OPTS="-Dfile.encoding=UTF-8" either through the GUI or by adding the
following to startup.bat:
set JAVA_OPTS="-Dfile.encoding=UTF-8"
--
Regards,
Shalin Shekhar Mangar.
RE: encoding problem
Posted by Bernadette Houghton <be...@deakin.edu.au>.
Thanks for your quick reply, Shalin.
Tomcat is running on my Windows machine, but does not appear in Windows Services (as I was expecting it should ... am I wrong?). I'm running it from a startup.bat on my desktop - see below. Do I add the Dfile line to the startup.bat?
SOLR is part of the repository software that we are running.
Thanks!
BERN
Startup.bat -
@echo off
if "%OS%" == "Windows_NT" setlocal
rem ---------------------------------------------------------------------------
rem Start script for the CATALINA Server
rem
rem $Id: startup.bat 302918 2004-05-27 18:25:11Z yoavs $
rem ---------------------------------------------------------------------------
rem Guess CATALINA_HOME if not defined
set CURRENT_DIR=%cd%
if not "%CATALINA_HOME%" == "" goto gotHome
set CATALINA_HOME=%CURRENT_DIR%
if exist "%CATALINA_HOME%\bin\catalina.bat" goto okHome
cd ..
set CATALINA_HOME=%cd%
cd %CURRENT_DIR%
:gotHome
if exist "%CATALINA_HOME%\bin\catalina.bat" goto okHome
echo The CATALINA_HOME environment variable is not defined correctly
echo This environment variable is needed to run this program
goto end
:okHome
set EXECUTABLE=%CATALINA_HOME%\bin\catalina.bat
rem Check that target executable exists
if exist "%EXECUTABLE%" goto okExec
echo Cannot find %EXECUTABLE%
echo This file is needed to run this program
goto end
:okExec
rem Get remaining unshifted command line arguments and save them in the
set CMD_LINE_ARGS=
:setArgs
if ""%1""=="""" goto doneSetArgs
set CMD_LINE_ARGS=%CMD_LINE_ARGS% %1
shift
goto setArgs
:doneSetArgs
call "%EXECUTABLE%" start %CMD_LINE_ARGS%
:end
Re: encoding problem
Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Wed, Aug 26, 2009 at 12:42 PM, Bernadette Houghton <
bernadette.houghton@deakin.edu.au> wrote:
> Hi Shalin, stupid question - I'm an apache/solr newbie - but how do I
> access the JVM???
>
When you execute the java executable, just add -Dfile.encoding=UTF-8 as a
command line argument to the executable.
How are you consuming Solr? You mentioned there is no tomcat, is your solr
client a desktop java application?
--
Regards,
Shalin Shekhar Mangar.
RE: encoding problem
Posted by Bernadette Houghton <be...@deakin.edu.au>.
Hi Shalin, stupid question - I'm an apache/solr newbie - but how do I access the JVM???
Regards
Bern
-----Original Message-----
From: Shalin Shekhar Mangar [mailto:shalinmangar@gmail.com]
Sent: Wednesday, 26 August 2009 5:10 PM
To: solr-user@lucene.apache.org
Subject: Re: encoding problem
On Wed, Aug 26, 2009 at 10:24 AM, Bernadette Houghton <
bernadette.houghton@deakin.edu.au> wrote:
> We have an encoding problem with our solr application. That is, non-ASCII
> chars displaying fine in SOLR, but in googledegook in our application .
>
> Our tomcat server.xml file already contains URIencoding="UTF-8" under the
> relevant <connector>.
>
> A google search reveals that I should set the encoding for the JVM, but
> have no idea how to do this. I'm running Windows, and there is no tomcat
> process in my Windows Services.
>
Add the following parameter to the JVM:
-Dfile.encoding=UTF-8
--
Regards,
Shalin Shekhar Mangar.