You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sujatha Arun <su...@gmail.com> on 2011/07/15 10:35:22 UTC

POST VS GET and NON English Characters

Hello,

We have implemented solr search in  several languages .Intially we used the
"GET" method for querying ,but later moved to  "POST" method to accomodate
lengthy queries .

When we moved form  GET TO POSt method ,the german characteres could no
longer be searched and I had to use the fucntion utf8_decode in my
application  for the search to work for german characters.

Currently I am doing this  while quering using the POST method ,we are using
the standard Request Handler


$this->_queryterm=iconv("UTF-8", "ISO-8859-1//TRANSLIT//IGNORE",
$this->_queryterm);


This makes the query work for german characters and other languages but does
not work for certain charactes  in Lithuvanian and spanish.Example:
*Not working

   - *Iš
   - Estremadūros
   - sNaująjį
   - MEDŽIAGOTYRA
   - MEDŽIAGOS
   - taškuose

*Working

   - *garbę
   - ieškoti
   - ispanų

Any ideas /input  ?

Regards
Sujatha

Re: POST VS GET and NON English Characters

Posted by François Schiettecatte <fs...@gmail.com>.
You need to do something like this in the ./conf/tomcat server.xml file:

    <Connector address="localhost" port="8000" protocol="HTTP/1.1"
               connectionTimeout="20000" URIEncoding="UTF-8" />

See 'URIEncoding' in http://tomcat.apache.org/tomcat-7.0-doc/config/http.html

Note that this will assume that the encoding of the data is in utf-8 if (and ONLY if) the charset parameter is not set in the HTTP request content type header, the header looks like this:

	Content-Type: text/plain; charset=UTF-8

Also note that most browsers encode data in ISO-8859-1 unless overridden in the browser settings or by the content type and charset set in the html in case you are using a form. This you can do either by setting it in the http response content type header (like above), or as a meta tag like this:

	<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Hope this helps.

Cheers

François



On Jul 20, 2011, at 7:20 AM, Sujatha Arun wrote:

> Paul ,
> 
> I added the fllowing line to catalina.sh  and restarted the server ,but this
> does not seem to help.
> 
> 
> JAVA_OPTS="-Djavax.servlet.request.encoding=UTF-8 -Dfile.encoding=UTF-8"
> Regards
> Sujatha
> 
> On Sun, Jul 17, 2011 at 3:51 AM, Paul Libbrecht <pa...@hoplahup.net> wrote:
> 
>> If you have the option, try setting the default charset of the
>> servlet-container to utf-8.
>> Typically this is done by setting a system property on startup.
>> 
>> My experience has been that the default used to be utf-8 but it is less and
>> less and sometimes in a surprising way!
>> 
>> paul
>> 
>> 
>> Le 16 juil. 2011 à 05:34, Sujatha Arun a écrit :
>> 
>>> It works fine with GET method ,but I am wondering why it does not with
>> POST
>>> method.
>>> 
>>> 2011/7/15 pankaj bhatt <pa...@gmail.com>
>>> 
>>>> Hi Arun,
>>>>        This looks like an Encoding issue to me.
>>>>     Can you change your browser settinsg to UTF-8 and hit the search
>> url
>>>> via GET method.
>>>> 
>>>>      We faced the similar problem with chienese,korean languages, this
>>>> solved the problem.
>>>> 
>>>> / Pankaj Bhatt.
>>>> 
>>>> 2011/7/15 Sujatha Arun <su...@gmail.com>
>>>> 
>>>>> Hello,
>>>>> 
>>>>> We have implemented solr search in  several languages .Intially we used
>>>> the
>>>>> "GET" method for querying ,but later moved to  "POST" method to
>>>> accomodate
>>>>> lengthy queries .
>>>>> 
>>>>> When we moved form  GET TO POSt method ,the german characteres could no
>>>>> longer be searched and I had to use the fucntion utf8_decode in my
>>>>> application  for the search to work for german characters.
>>>>> 
>>>>> Currently I am doing this  while quering using the POST method ,we are
>>>>> using
>>>>> the standard Request Handler
>>>>> 
>>>>> 
>>>>> $this->_queryterm=iconv("UTF-8", "ISO-8859-1//TRANSLIT//IGNORE",
>>>>> $this->_queryterm);
>>>>> 
>>>>> 
>>>>> This makes the query work for german characters and other languages but
>>>>> does
>>>>> not work for certain charactes  in Lithuvanian and spanish.Example:
>>>>> *Not working
>>>>> 
>>>>> - *Iš
>>>>> - Estremadūros
>>>>> - sNaująjį
>>>>> - MEDŽIAGOTYRA
>>>>> - MEDŽIAGOS
>>>>> - taškuose
>>>>> 
>>>>> *Working
>>>>> 
>>>>> - *garbę
>>>>> - ieškoti
>>>>> - ispanų
>>>>> 
>>>>> Any ideas /input  ?
>>>>> 
>>>>> Regards
>>>>> Sujatha
>>>>> 
>>>> 
>> 
>> 


Re: POST VS GET and NON English Characters

Posted by Sujatha Arun <su...@gmail.com>.
Paul ,

I added the fllowing line to catalina.sh  and restarted the server ,but this
does not seem to help.


JAVA_OPTS="-Djavax.servlet.request.encoding=UTF-8 -Dfile.encoding=UTF-8"
Regards
Sujatha

On Sun, Jul 17, 2011 at 3:51 AM, Paul Libbrecht <pa...@hoplahup.net> wrote:

> If you have the option, try setting the default charset of the
> servlet-container to utf-8.
> Typically this is done by setting a system property on startup.
>
> My experience has been that the default used to be utf-8 but it is less and
> less and sometimes in a surprising way!
>
> paul
>
>
> Le 16 juil. 2011 à 05:34, Sujatha Arun a écrit :
>
> > It works fine with GET method ,but I am wondering why it does not with
> POST
> > method.
> >
> > 2011/7/15 pankaj bhatt <pa...@gmail.com>
> >
> >> Hi Arun,
> >>         This looks like an Encoding issue to me.
> >>      Can you change your browser settinsg to UTF-8 and hit the search
> url
> >> via GET method.
> >>
> >>       We faced the similar problem with chienese,korean languages, this
> >> solved the problem.
> >>
> >> / Pankaj Bhatt.
> >>
> >> 2011/7/15 Sujatha Arun <su...@gmail.com>
> >>
> >>> Hello,
> >>>
> >>> We have implemented solr search in  several languages .Intially we used
> >> the
> >>> "GET" method for querying ,but later moved to  "POST" method to
> >> accomodate
> >>> lengthy queries .
> >>>
> >>> When we moved form  GET TO POSt method ,the german characteres could no
> >>> longer be searched and I had to use the fucntion utf8_decode in my
> >>> application  for the search to work for german characters.
> >>>
> >>> Currently I am doing this  while quering using the POST method ,we are
> >>> using
> >>> the standard Request Handler
> >>>
> >>>
> >>> $this->_queryterm=iconv("UTF-8", "ISO-8859-1//TRANSLIT//IGNORE",
> >>> $this->_queryterm);
> >>>
> >>>
> >>> This makes the query work for german characters and other languages but
> >>> does
> >>> not work for certain charactes  in Lithuvanian and spanish.Example:
> >>> *Not working
> >>>
> >>>  - *Iš
> >>>  - Estremadūros
> >>>  - sNaująjį
> >>>  - MEDŽIAGOTYRA
> >>>  - MEDŽIAGOS
> >>>  - taškuose
> >>>
> >>> *Working
> >>>
> >>>  - *garbę
> >>>  - ieškoti
> >>>  - ispanų
> >>>
> >>> Any ideas /input  ?
> >>>
> >>> Regards
> >>> Sujatha
> >>>
> >>
>
>

Re: POST VS GET and NON English Characters

Posted by Paul Libbrecht <pa...@hoplahup.net>.
If you have the option, try setting the default charset of the servlet-container to utf-8.
Typically this is done by setting a system property on startup.

My experience has been that the default used to be utf-8 but it is less and less and sometimes in a surprising way!

paul


Le 16 juil. 2011 à 05:34, Sujatha Arun a écrit :

> It works fine with GET method ,but I am wondering why it does not with POST
> method.
> 
> 2011/7/15 pankaj bhatt <pa...@gmail.com>
> 
>> Hi Arun,
>>         This looks like an Encoding issue to me.
>>      Can you change your browser settinsg to UTF-8 and hit the search url
>> via GET method.
>> 
>>       We faced the similar problem with chienese,korean languages, this
>> solved the problem.
>> 
>> / Pankaj Bhatt.
>> 
>> 2011/7/15 Sujatha Arun <su...@gmail.com>
>> 
>>> Hello,
>>> 
>>> We have implemented solr search in  several languages .Intially we used
>> the
>>> "GET" method for querying ,but later moved to  "POST" method to
>> accomodate
>>> lengthy queries .
>>> 
>>> When we moved form  GET TO POSt method ,the german characteres could no
>>> longer be searched and I had to use the fucntion utf8_decode in my
>>> application  for the search to work for german characters.
>>> 
>>> Currently I am doing this  while quering using the POST method ,we are
>>> using
>>> the standard Request Handler
>>> 
>>> 
>>> $this->_queryterm=iconv("UTF-8", "ISO-8859-1//TRANSLIT//IGNORE",
>>> $this->_queryterm);
>>> 
>>> 
>>> This makes the query work for german characters and other languages but
>>> does
>>> not work for certain charactes  in Lithuvanian and spanish.Example:
>>> *Not working
>>> 
>>>  - *Iš
>>>  - Estremadūros
>>>  - sNaująjį
>>>  - MEDŽIAGOTYRA
>>>  - MEDŽIAGOS
>>>  - taškuose
>>> 
>>> *Working
>>> 
>>>  - *garbę
>>>  - ieškoti
>>>  - ispanų
>>> 
>>> Any ideas /input  ?
>>> 
>>> Regards
>>> Sujatha
>>> 
>> 


Re: POST VS GET and NON English Characters

Posted by Sujatha Arun <su...@gmail.com>.
It works fine with GET method ,but I am wondering why it does not with POST
method.

2011/7/15 pankaj bhatt <pa...@gmail.com>

> Hi Arun,
>          This looks like an Encoding issue to me.
>       Can you change your browser settinsg to UTF-8 and hit the search url
> via GET method.
>
>        We faced the similar problem with chienese,korean languages, this
> solved the problem.
>
> / Pankaj Bhatt.
>
> 2011/7/15 Sujatha Arun <su...@gmail.com>
>
> > Hello,
> >
> > We have implemented solr search in  several languages .Intially we used
> the
> > "GET" method for querying ,but later moved to  "POST" method to
> accomodate
> > lengthy queries .
> >
> > When we moved form  GET TO POSt method ,the german characteres could no
> > longer be searched and I had to use the fucntion utf8_decode in my
> > application  for the search to work for german characters.
> >
> > Currently I am doing this  while quering using the POST method ,we are
> > using
> > the standard Request Handler
> >
> >
> > $this->_queryterm=iconv("UTF-8", "ISO-8859-1//TRANSLIT//IGNORE",
> > $this->_queryterm);
> >
> >
> > This makes the query work for german characters and other languages but
> > does
> > not work for certain charactes  in Lithuvanian and spanish.Example:
> > *Not working
> >
> >   - *Iš
> >   - Estremadūros
> >   - sNaująjį
> >   - MEDŽIAGOTYRA
> >   - MEDŽIAGOS
> >   - taškuose
> >
> > *Working
> >
> >   - *garbę
> >   - ieškoti
> >   - ispanų
> >
> > Any ideas /input  ?
> >
> > Regards
> > Sujatha
> >
>

Re: POST VS GET and NON English Characters

Posted by pankaj bhatt <pa...@gmail.com>.
Hi Arun,
          This looks like an Encoding issue to me.
       Can you change your browser settinsg to UTF-8 and hit the search url
via GET method.

        We faced the similar problem with chienese,korean languages, this
solved the problem.

/ Pankaj Bhatt.

2011/7/15 Sujatha Arun <su...@gmail.com>

> Hello,
>
> We have implemented solr search in  several languages .Intially we used the
> "GET" method for querying ,but later moved to  "POST" method to accomodate
> lengthy queries .
>
> When we moved form  GET TO POSt method ,the german characteres could no
> longer be searched and I had to use the fucntion utf8_decode in my
> application  for the search to work for german characters.
>
> Currently I am doing this  while quering using the POST method ,we are
> using
> the standard Request Handler
>
>
> $this->_queryterm=iconv("UTF-8", "ISO-8859-1//TRANSLIT//IGNORE",
> $this->_queryterm);
>
>
> This makes the query work for german characters and other languages but
> does
> not work for certain charactes  in Lithuvanian and spanish.Example:
> *Not working
>
>   - *Iš
>   - Estremadūros
>   - sNaująjį
>   - MEDŽIAGOTYRA
>   - MEDŽIAGOS
>   - taškuose
>
> *Working
>
>   - *garbę
>   - ieškoti
>   - ispanų
>
> Any ideas /input  ?
>
> Regards
> Sujatha
>