You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Glock, Thomas" <th...@pfizer.com> on 2009/10/24 17:28:51 UTC

Solr under tomcat - UTF-8 issue

Hoping someone can help -

Problem: 
	Querying for non-english phrases such as Добавить do not return any results under Tomcat but do work when using the Jetty example.  

	Both tomcat and jetty are being queried by the same custom (flash) client and both reference the same solr/data/index.  

	I'm using an http POST rather than http GET to do the query to solr.  I believe the problem must be in how tomcat is configured and had hoped the -Dfile.encoding=UTF-8 would solve it - but no luck.  I've stopped started tomcat and deleted the work directory as well.

	Results are the same in both IE6 and Firefox and I've used both firebug and fiddler to view the http request/responses.  It is consistent - jetty works, tomcat does not.

Environment:
	Tomcat 6 as a service on WinXP Professional 2002 sp 2 
	Tomcat Java properties -

	-Dcatalina.home=C:\Program Files\Apache Software Foundation\Tomcat 6.0
	-Dcatalina.base=C:\Program Files\Apache Software Foundation\Tomcat 6.0
	-Djava.endorsed.dirs=C:\Program Files\Apache Software Foundation\Tomcat 6.0\endorsed
	-Djava.io.tmpdir=C:\Program Files\Apache Software Foundation\Tomcat 6.0\temp
	-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
	-Djava.util.logging.config.file=C:\Program Files\Apache Software Foundation\Tomcat 6.0\conf\logging.properties
	-Dfile.encoding=UTF-8

Thanks in advance.
Tom Glock


RE: Solr under tomcat - UTF-8 issue

Posted by "Glock, Thomas" <th...@pfizer.com>.
As it turns out I'm back to GET myself.  

I just noticed that tomcat as well, although ultimately plan to run
under weblogic (not sure what the length on that url is and/or if there
are limits on the flex client doing the GET)

Reading the book (page 108) I noticed that my querys need to have more
fq=field:value params.

Earlier I had incorrectly defined a single fq param with a bunch of
criteria such as fq=field_1:value AND field_2:value_2 AND (role:r1 or
role:r2 or role:r3 or role:r3)

Apparently the boolean role clause can be speficied as:

  fq=role:(r1 || r2 || r3 || r4) 

as opposed to: 

  ... AND (role:r1 or role:r2 or role:r3 or role:r3) 

That syntax shortens queries too.  Note I havent' tested yet against a
smaller set of docs to be sure the new query syntax works...

-----Original Message-----
From: markwaddle [mailto:mark@markwaddle.com] 
Sent: Monday, October 26, 2009 2:12 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr under tomcat - UTF-8 issue


I was originally using POST for the same reason, however I discovered
that Tomcat could easily be configured to accept any length URI. All it
requires is specifying the maxHttpHeaderSize attribute in your default
Connector in server.xml. I set my value to 1MB, which is certainly
excessive, but it ensures I will never hit the limit. As the other chap
mentioned, I now have the benefits of caching and most importantly,
proper web logs!

I also have a similar situation where I constrain the search results
based on the user's role. I have only two roles to support, so my case
is very simple, but I could imagine having a multivalued "role" field
that you could perform facet queries on.

Mark


Glock, Thomas wrote:
> 
> Thanks -
> 
> I agree.  However my application requires results be trimmed to users 
> based on roles.  The roles are repeating values on the documents.  
> Users have many different role combinations as do documents.
> I recognize this is going to hamper caching - but using a GET will 
> tend to limit the size of search phrases when combined with the 
> boolean role clause.  And I am concerned with hitting url limits.
> 
> At any rate I solved it thanks to Yonik's recommendation.  
> 
> My flex client httpservice by default only sets the content-type 
> request header to  "application/x-www-form-urlencoded"  what it needed

> to do for tomcat is set the content-type request header to 
> content-type = "application/x-www-form-urlencoded; charset=UTF-8";
> 
> If you have any suggestions regarding limiting results based on user 
> and document role permutations - I'm all ears.  I've been to the 
> Search Summit in NYC and no vendor could even seem to grasp the
concept.
> 
> The problem case statement is this  - I have users globally who need 
> to search for content tailored to them.  Users searching for 'Holiday'

> don't get any value from 10000 documents having the word holiday. What

> they need are documents authored for that population.  The documents 
> have the associated role information as metadata and therefore users 
> will get only the documents they have access to and are relevant to 
> them.  That's the plan anyway!
> 
> By chance I stumbled in Solr a month or so ago and I think its 
> awesome.  I got the book two days ago too - fantastic!
> 
> Thanks again,
> Tom
> 

--
View this message in context:
http://www.nabble.com/Solr-under-tomcat---UTF-8-issue-tp26040052p2605494
2.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr under tomcat - UTF-8 issue

Posted by markwaddle <ma...@markwaddle.com>.
I was originally using POST for the same reason, however I discovered that
Tomcat could easily be configured to accept any length URI. All it requires
is specifying the maxHttpHeaderSize attribute in your default Connector in
server.xml. I set my value to 1MB, which is certainly excessive, but it
ensures I will never hit the limit. As the other chap mentioned, I now have
the benefits of caching and most importantly, proper web logs!

I also have a similar situation where I constrain the search results based
on the user's role. I have only two roles to support, so my case is very
simple, but I could imagine having a multivalued "role" field that you could
perform facet queries on.

Mark


Glock, Thomas wrote:
> 
> Thanks -
> 
> I agree.  However my application requires results be trimmed to users
> based on roles.  The roles are repeating values on the documents.  Users
> have many different role combinations as do documents.
> I recognize this is going to hamper caching - but using a GET will tend to
> limit the size of search phrases when combined with the boolean role
> clause.  And I am concerned with hitting url limits.
> 
> At any rate I solved it thanks to Yonik's recommendation.  
> 
> My flex client httpservice by default only sets the content-type request
> header to  "application/x-www-form-urlencoded"  what it needed to do for
> tomcat is set the content-type request header to content-type =
> "application/x-www-form-urlencoded; charset=UTF-8"; 
> 
> If you have any suggestions regarding limiting results based on user and
> document role permutations - I'm all ears.  I've been to the Search Summit
> in NYC and no vendor could even seem to grasp the concept.  
> 
> The problem case statement is this  - I have users globally who need to
> search for content tailored to them.  Users searching for 'Holiday' don't
> get any value from 10000 documents having the word holiday. What they need
> are documents authored for that population.  The documents have the
> associated role information as metadata and therefore users will get only
> the documents they have access to and are relevant to them.  That's the
> plan anyway!  
> 
> By chance I stumbled in Solr a month or so ago and I think its awesome.  I
> got the book two days ago too - fantastic!
> 
> Thanks again,
> Tom
> 

-- 
View this message in context: http://www.nabble.com/Solr-under-tomcat---UTF-8-issue-tp26040052p26054942.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr under tomcat - UTF-8 issue

Posted by Sven Maurmann <sv...@kippdata.de>.
Hi,

I did not read the original mail, but for the UTF-8 issue with Tomcat
you might consult the url http://wiki.apache.org/solr/SolrTomcat

The relevant piece of information is under "URI Charset Config":

*** quote ***
Edit Tomcat's conf/server.xml and add the following attribute to the correct
Connector element: URIEncoding="UTF-8".

<Server ...>
 <Service ...>
   <Connector ... URIEncoding="UTF-8"/>
     ...
   </Connector>
 </Service>
</Server>
*** end quote ***

Sven


--On Freitag, 22. Januar 2010 23:41 +0100 Frank Wesemann 
<f....@fotofinder.net> wrote:

> Glock, Thomas schrieb:
>>
>> My flex client httpservice by default only sets the content-type request
>> header to  "application/x-www-form-urlencoded"  what it needed to do for
>> tomcat is set the content-type request header to content-type =
>> "application/x-www-form-urlencoded; charset=UTF-8";
>>
>>
>>
> As some browsers do not send this particular content-type correctly ( at
> least Firefox and Safari skip the "charset=utf-8" part),
> I added a servlet.Filter :
>
> public class RequestCharset2utf8Filter implements javax.servlet.Filter {
> ...
> 	public void doFilter(ServletRequest req, ServletResponse res,
> FilterChain chain) throws IOException, ServletException {
> 		request.setCharacterEncoding("UTF-8");
> 		chain.doFilter( req, res);
> 	}
> }
>
> as the first filter to my webapp:
> in web.xml:
>
>   <filter>
>       <filter-name>CharsetEncodingFilter</filter-name>
>
> <filter-class>my.package.servlet.RequestCharset2utf8Filter</filter-class>
>   </filter>
>   <filter-mapping>
>    <filter-name>CharsetEncodingFilter</filter-name>
>    <url-pattern>/*</url-pattern>
>   </filter-mapping>
>
>
> I run it on tomcat 6.0.18 .
>
> And:
> wonder is of course right, but life isn't all beer and skittles.
>
> --
> mit freundlichem Gruß,
>
> Frank Wesemann
> Fotofinder GmbH         USt-IdNr. DE812854514
> Software Entwicklung    Web: http://www.fotofinder.com/
> Potsdamer Str. 96       Tel: +49 30 25 79 28 90
> 10785 Berlin            Fax: +49 30 25 79 28 999
>
> Sitz: Berlin
> Amtsgericht Berlin Charlottenburg (HRB 73099)
> Geschäftsführer: Ali Paczensky

Re: Solr under tomcat - UTF-8 issue

Posted by Frank Wesemann <f....@fotofinder.net>.
Glock, Thomas schrieb:
>
> My flex client httpservice by default only sets the content-type request header to  "application/x-www-form-urlencoded"  what it needed to do for tomcat is set the content-type request header to content-type = "application/x-www-form-urlencoded; charset=UTF-8"; 
>
>
>   
As some browsers do not send this particular content-type correctly ( at 
least Firefox and Safari skip the "charset=utf-8" part),
I added a servlet.Filter :

public class RequestCharset2utf8Filter implements javax.servlet.Filter {
...
	public void doFilter(ServletRequest req, ServletResponse res, FilterChain chain) throws IOException, ServletException {
		request.setCharacterEncoding("UTF-8");
		chain.doFilter( req, res);
	}
}

as the first filter to my webapp:
in web.xml:

  <filter>
      <filter-name>CharsetEncodingFilter</filter-name>
      <filter-class>my.package.servlet.RequestCharset2utf8Filter</filter-class>
  </filter>
  <filter-mapping>
   <filter-name>CharsetEncodingFilter</filter-name>
   <url-pattern>/*</url-pattern>
  </filter-mapping>


I run it on tomcat 6.0.18 .

And:
wonder is of course right, but life isn't all beer and skittles.

-- 
mit freundlichem Gruß,

Frank Wesemann
Fotofinder GmbH         USt-IdNr. DE812854514
Software Entwicklung    Web: http://www.fotofinder.com/
Potsdamer Str. 96       Tel: +49 30 25 79 28 90
10785 Berlin            Fax: +49 30 25 79 28 999

Sitz: Berlin
Amtsgericht Berlin Charlottenburg (HRB 73099)
Geschäftsführer: Ali Paczensky




RE: Solr under tomcat - UTF-8 issue

Posted by "Glock, Thomas" <th...@pfizer.com>.
Thanks -

I agree.  However my application requires results be trimmed to users based on roles.  The roles are repeating values on the documents.  Users have many different role combinations as do documents.
I recognize this is going to hamper caching - but using a GET will tend to limit the size of search phrases when combined with the boolean role clause.  And I am concerned with hitting url limits.

At any rate I solved it thanks to Yonik's recommendation.  

My flex client httpservice by default only sets the content-type request header to  "application/x-www-form-urlencoded"  what it needed to do for tomcat is set the content-type request header to content-type = "application/x-www-form-urlencoded; charset=UTF-8"; 

If you have any suggestions regarding limiting results based on user and document role permutations - I'm all ears.  I've been to the Search Summit in NYC and no vendor could even seem to grasp the concept.  

The problem case statement is this  - I have users globally who need to search for content tailored to them.  Users searching for 'Holiday' don't get any value from 10000 documents having the word holiday. What they need are documents authored for that population.  The documents have the associated role information as metadata and therefore users will get only the documents they have access to and are relevant to them.  That's the plan anyway!  

By chance I stumbled in Solr a month or so ago and I think its awesome.  I got the book two days ago too - fantastic!

Thanks again,
Tom

-----Original Message-----
From: Walter Underwood [mailto:wunder@wunderwood.org] 
Sent: Saturday, October 24, 2009 1:31 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr under tomcat - UTF-8 issue

Don't use POST. That is the wrong HTTP semantic for search results.  
Use GET. That will make it possible to cache the results, will make your HTTP logs useful, and all sorts of other good things.

wunder

On Oct 24, 2009, at 10:11 AM, Glock, Thomas wrote:

>
> Thanks - I now think it must be due to my client not sending enough ( 
> or correct ) headers in the request.
>
> Tomcat does work when using an HTTP GET but is failing the POST from 
> my flash client.
>
> For example putting this in both firefox and IE browsers url works
> correctly:
>
> http://localhost:8080/hranswers/elevate?fl=*%20score&indent=on&start=0
> &q=%D0%94%D0%BE%D0%B1%D0%B0%D0%B2%D0%B8%D1%82%D1%8C%20%D0%BD%D0%BE%D0%
> B2%D1%8B%D1%85%20%D0%BA%D0%B0%D0%BD%D0%B4%D0%B8%D0%B4%D0%B0%D1%82%D0%B
> E%D0%B2&fq=language_cd:ru&rows=20
>
> The POST information my client is sending looks like this and it
> fails:
>
> POST /hranswers/elevate HTTP/1.1
> Accept: */*
> Accept-Language: en-US
> x-flash-version: 10,0,32,18
> Content-Type: application/x-www-form-urlencoded
> Content-Encoding: UTF-8
> Content-Length: 209
> Accept-Encoding: gzip, deflate
> User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; 
> .NET CLR 1.1.4322; InfoPath.1; .NET CLR 2.0.50727; .NET CLR 
> 3.0.04506.648; MS-RTC LM 8; .NET CLR 3.0.4506.2152; .NET CLR 
> 3.5.30729; UserABC123)
> Host: localhost:8080
> Connection: Keep-Alive
> Pragma: no-cache
>
> fq=language%5Fcd%3Aru&rows=20&start=0&fl=%2A%20score&indent=on&q=
> %D0%94%D0%BE%D0%B1%D0%B0%D0%B2%D0%B8%D1%82%D1%8C%20%D0%BD%D0%BE
> %D0%B2%D1%8B%D1%85%20%D0%BA%D0%B0%D0%BD
> %D0%B4%D0%B8%D0%B4%D0%B0%D1%82%D0%BE%D0%B2
>
> I will keep digging - and let you know how it turns out.
>
> Thanks!
>
>
> -----Original Message-----
> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik 
> Seeley
> Sent: Saturday, October 24, 2009 12:43 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr under tomcat - UTF-8 issue
>
> Try using example/exampledocs/test_utf8.sh to narrow down if the 
> charset problems you're hitting are due to servlet container 
> configuration.
>
> -Yonik
> http://www.lucidimagination.com
>
>
> 2009/10/24 Glock, Thomas <th...@pfizer.com>:
>>
>> Thanks but not working...
>>
>> I did have the URIEncoding in place and just again moved the 
>> URIEncoding attribute to be the first attribute - ensured I saved 
>> sever.xml, shut down tomcat, deleted logs and cache and still no 
>> luck....  Its probably something very simple and I'm just missing it.
>>
>> Thanks for your help.
>>
>>
>> -----Original Message-----
>> From: Zsolt Czinkos [mailto:czinkos@gmail.com]
>> Sent: Saturday, October 24, 2009 11:36 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr under tomcat - UTF-8 issue
>>
>> Hello
>>
>> Have you set URIEncoding attribute to UTF-8 in tomcat's server.xml 
>> (on connector element)?
>>
>> Like:
>>
>> <Connector URIEncoding="UTF-8" connectionTimeout="20000" port="8080"
>> protocol="HTTP/1.1" redirectPort="8443"/>
>>
>> Hope this helps.
>>
>> Best regards
>>
>> czinkos
>>
>>
>> 2009/10/24 Glock, Thomas <th...@pfizer.com>:
>>>
>>> Hoping someone can help -
>>>
>>> Problem:
>>>        Querying for non-english phrases such as Добавить do not 
>>> return any results under Tomcat but do work when using the Jetty 
>>> example.
>>>
>>>        Both tomcat and jetty are being queried by the same custom
>>> (flash) client and both reference the same solr/data/index.
>>>
>>>        I'm using an http POST rather than http GET to do the query 
>>> to solr.  I believe the problem must be in how tomcat is configured 
>>> and had hoped the -Dfile.encoding=UTF-8 would solve it
>>> - but no luck.  I've stopped started tomcat and deleted the work 
>>> directory as well.
>>>
>>>        Results are the same in both IE6 and Firefox and I've used 
>>> both firebug and fiddler to view the http request/responses.  It is 
>>> consistent - jetty works, tomcat does not.
>>>
>>> Environment:
>>>        Tomcat 6 as a service on WinXP Professional 2002 sp 2
>>>        Tomcat Java properties -
>>>
>>>        -Dcatalina.home=C:\Program Files\Apache Software 
>>> Foundation\Tomcat 6.0
>>>        -Dcatalina.base=C:\Program Files\Apache Software 
>>> Foundation\Tomcat 6.0
>>>        -Djava.endorsed.dirs=C:\Program Files\Apache Software 
>>> Foundation\Tomcat 6.0\endorsed
>>>        -Djava.io.tmpdir=C:\Program Files\Apache Software 
>>> Foundation\Tomcat 6.0\temp
>>>
>>> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
>>>        -Djava.util.logging.config.file=C:\Program Files\Apache 
>>> Software Foundation\Tomcat 6.0\conf\logging.properties
>>>        -Dfile.encoding=UTF-8
>>>
>>> Thanks in advance.
>>> Tom Glock
>>>
>>>
>>
>


Re: Solr under tomcat - UTF-8 issue

Posted by Walter Underwood <wu...@wunderwood.org>.
Don't use POST. That is the wrong HTTP semantic for search results.  
Use GET. That will make it possible to cache the results, will make  
your HTTP logs useful, and all sorts of other good things.

wunder

On Oct 24, 2009, at 10:11 AM, Glock, Thomas wrote:

>
> Thanks - I now think it must be due to my client not sending enough  
> ( or correct ) headers in the request.
>
> Tomcat does work when using an HTTP GET but is failing the POST from  
> my flash client.
>
> For example putting this in both firefox and IE browsers url works  
> correctly:
>
> http://localhost:8080/hranswers/elevate?fl=*%20score&indent=on&start=0&q=%D0%94%D0%BE%D0%B1%D0%B0%D0%B2%D0%B8%D1%82%D1%8C%20%D0%BD%D0%BE%D0%B2%D1%8B%D1%85%20%D0%BA%D0%B0%D0%BD%D0%B4%D0%B8%D0%B4%D0%B0%D1%82%D0%BE%D0%B2&fq=language_cd:ru&rows=20
>
> The POST information my client is sending looks like this and it  
> fails:
>
> POST /hranswers/elevate HTTP/1.1
> Accept: */*
> Accept-Language: en-US
> x-flash-version: 10,0,32,18
> Content-Type: application/x-www-form-urlencoded
> Content-Encoding: UTF-8
> Content-Length: 209
> Accept-Encoding: gzip, deflate
> User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;  
> SV1; .NET CLR 1.1.4322; InfoPath.1; .NET CLR 2.0.50727; .NET CLR  
> 3.0.04506.648; MS-RTC LM 8; .NET CLR 3.0.4506.2152; .NET CLR  
> 3.5.30729; UserABC123)
> Host: localhost:8080
> Connection: Keep-Alive
> Pragma: no-cache
>
> fq=language%5Fcd%3Aru&rows=20&start=0&fl=%2A%20score&indent=on&q= 
> %D0%94%D0%BE%D0%B1%D0%B0%D0%B2%D0%B8%D1%82%D1%8C%20%D0%BD%D0%BE 
> %D0%B2%D1%8B%D1%85%20%D0%BA%D0%B0%D0%BD 
> %D0%B4%D0%B8%D0%B4%D0%B0%D1%82%D0%BE%D0%B2
>
> I will keep digging - and let you know how it turns out.
>
> Thanks!
>
>
> -----Original Message-----
> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of  
> Yonik Seeley
> Sent: Saturday, October 24, 2009 12:43 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr under tomcat - UTF-8 issue
>
> Try using example/exampledocs/test_utf8.sh to narrow down if the  
> charset problems you're hitting are due to servlet container  
> configuration.
>
> -Yonik
> http://www.lucidimagination.com
>
>
> 2009/10/24 Glock, Thomas <th...@pfizer.com>:
>>
>> Thanks but not working...
>>
>> I did have the URIEncoding in place and just again moved the  
>> URIEncoding attribute to be the first attribute - ensured I saved  
>> sever.xml, shut down tomcat, deleted logs and cache and still no  
>> luck....  Its probably something very simple and I'm just missing it.
>>
>> Thanks for your help.
>>
>>
>> -----Original Message-----
>> From: Zsolt Czinkos [mailto:czinkos@gmail.com]
>> Sent: Saturday, October 24, 2009 11:36 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr under tomcat - UTF-8 issue
>>
>> Hello
>>
>> Have you set URIEncoding attribute to UTF-8 in tomcat's server.xml  
>> (on connector element)?
>>
>> Like:
>>
>> <Connector URIEncoding="UTF-8" connectionTimeout="20000" port="8080"
>> protocol="HTTP/1.1" redirectPort="8443"/>
>>
>> Hope this helps.
>>
>> Best regards
>>
>> czinkos
>>
>>
>> 2009/10/24 Glock, Thomas <th...@pfizer.com>:
>>>
>>> Hoping someone can help -
>>>
>>> Problem:
>>>        Querying for non-english phrases such as Добавить do not  
>>> return any results under Tomcat but do work when using the Jetty  
>>> example.
>>>
>>>        Both tomcat and jetty are being queried by the same custom  
>>> (flash) client and both reference the same solr/data/index.
>>>
>>>        I'm using an http POST rather than http GET to do the query  
>>> to solr.  I believe the problem must be in how tomcat is  
>>> configured and had hoped the -Dfile.encoding=UTF-8 would solve it  
>>> - but no luck.  I've stopped started tomcat and deleted the work  
>>> directory as well.
>>>
>>>        Results are the same in both IE6 and Firefox and I've used  
>>> both firebug and fiddler to view the http request/responses.  It  
>>> is consistent - jetty works, tomcat does not.
>>>
>>> Environment:
>>>        Tomcat 6 as a service on WinXP Professional 2002 sp 2
>>>        Tomcat Java properties -
>>>
>>>        -Dcatalina.home=C:\Program Files\Apache Software
>>> Foundation\Tomcat 6.0
>>>        -Dcatalina.base=C:\Program Files\Apache Software
>>> Foundation\Tomcat 6.0
>>>        -Djava.endorsed.dirs=C:\Program Files\Apache Software
>>> Foundation\Tomcat 6.0\endorsed
>>>        -Djava.io.tmpdir=C:\Program Files\Apache Software
>>> Foundation\Tomcat 6.0\temp
>>>
>>> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
>>>        -Djava.util.logging.config.file=C:\Program Files\Apache
>>> Software Foundation\Tomcat 6.0\conf\logging.properties
>>>        -Dfile.encoding=UTF-8
>>>
>>> Thanks in advance.
>>> Tom Glock
>>>
>>>
>>
>


RE: Solr under tomcat - UTF-8 issue

Posted by "Glock, Thomas" <th...@pfizer.com>.
 
Thanks - I now think it must be due to my client not sending enough ( or correct ) headers in the request.

Tomcat does work when using an HTTP GET but is failing the POST from my flash client. 

For example putting this in both firefox and IE browsers url works correctly:

http://localhost:8080/hranswers/elevate?fl=*%20score&indent=on&start=0&q=%D0%94%D0%BE%D0%B1%D0%B0%D0%B2%D0%B8%D1%82%D1%8C%20%D0%BD%D0%BE%D0%B2%D1%8B%D1%85%20%D0%BA%D0%B0%D0%BD%D0%B4%D0%B8%D0%B4%D0%B0%D1%82%D0%BE%D0%B2&fq=language_cd:ru&rows=20

The POST information my client is sending looks like this and it fails:

POST /hranswers/elevate HTTP/1.1
Accept: */*
Accept-Language: en-US
x-flash-version: 10,0,32,18
Content-Type: application/x-www-form-urlencoded
Content-Encoding: UTF-8
Content-Length: 209
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; InfoPath.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; MS-RTC LM 8; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; UserABC123)
Host: localhost:8080
Connection: Keep-Alive
Pragma: no-cache

fq=language%5Fcd%3Aru&rows=20&start=0&fl=%2A%20score&indent=on&q=%D0%94%D0%BE%D0%B1%D0%B0%D0%B2%D0%B8%D1%82%D1%8C%20%D0%BD%D0%BE%D0%B2%D1%8B%D1%85%20%D0%BA%D0%B0%D0%BD%D0%B4%D0%B8%D0%B4%D0%B0%D1%82%D0%BE%D0%B2

I will keep digging - and let you know how it turns out.

Thanks!


-----Original Message-----
From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik Seeley
Sent: Saturday, October 24, 2009 12:43 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr under tomcat - UTF-8 issue

Try using example/exampledocs/test_utf8.sh to narrow down if the charset problems you're hitting are due to servlet container configuration.

-Yonik
http://www.lucidimagination.com


2009/10/24 Glock, Thomas <th...@pfizer.com>:
>
> Thanks but not working...
>
> I did have the URIEncoding in place and just again moved the URIEncoding attribute to be the first attribute - ensured I saved sever.xml, shut down tomcat, deleted logs and cache and still no luck....  Its probably something very simple and I'm just missing it.
>
> Thanks for your help.
>
>
> -----Original Message-----
> From: Zsolt Czinkos [mailto:czinkos@gmail.com]
> Sent: Saturday, October 24, 2009 11:36 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr under tomcat - UTF-8 issue
>
> Hello
>
> Have you set URIEncoding attribute to UTF-8 in tomcat's server.xml (on connector element)?
>
> Like:
>
> <Connector URIEncoding="UTF-8" connectionTimeout="20000" port="8080"
> protocol="HTTP/1.1" redirectPort="8443"/>
>
> Hope this helps.
>
> Best regards
>
> czinkos
>
>
> 2009/10/24 Glock, Thomas <th...@pfizer.com>:
>>
>> Hoping someone can help -
>>
>> Problem:
>>        Querying for non-english phrases such as Добавить do not return any results under Tomcat but do work when using the Jetty example.
>>
>>        Both tomcat and jetty are being queried by the same custom (flash) client and both reference the same solr/data/index.
>>
>>        I'm using an http POST rather than http GET to do the query to solr.  I believe the problem must be in how tomcat is configured and had hoped the -Dfile.encoding=UTF-8 would solve it - but no luck.  I've stopped started tomcat and deleted the work directory as well.
>>
>>        Results are the same in both IE6 and Firefox and I've used both firebug and fiddler to view the http request/responses.  It is consistent - jetty works, tomcat does not.
>>
>> Environment:
>>        Tomcat 6 as a service on WinXP Professional 2002 sp 2
>>        Tomcat Java properties -
>>
>>        -Dcatalina.home=C:\Program Files\Apache Software 
>> Foundation\Tomcat 6.0
>>        -Dcatalina.base=C:\Program Files\Apache Software 
>> Foundation\Tomcat 6.0
>>        -Djava.endorsed.dirs=C:\Program Files\Apache Software 
>> Foundation\Tomcat 6.0\endorsed
>>        -Djava.io.tmpdir=C:\Program Files\Apache Software 
>> Foundation\Tomcat 6.0\temp
>>
>> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
>>        -Djava.util.logging.config.file=C:\Program Files\Apache 
>> Software Foundation\Tomcat 6.0\conf\logging.properties
>>        -Dfile.encoding=UTF-8
>>
>> Thanks in advance.
>> Tom Glock
>>
>>
>

Re: Solr under tomcat - UTF-8 issue

Posted by Yonik Seeley <yo...@lucidimagination.com>.
Try using example/exampledocs/test_utf8.sh to narrow down if the
charset problems you're hitting are due to servlet container
configuration.

-Yonik
http://www.lucidimagination.com


2009/10/24 Glock, Thomas <th...@pfizer.com>:
>
> Thanks but not working...
>
> I did have the URIEncoding in place and just again moved the URIEncoding attribute to be the first attribute - ensured I saved sever.xml, shut down tomcat, deleted logs and cache and still no luck....  Its probably something very simple and I'm just missing it.
>
> Thanks for your help.
>
>
> -----Original Message-----
> From: Zsolt Czinkos [mailto:czinkos@gmail.com]
> Sent: Saturday, October 24, 2009 11:36 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr under tomcat - UTF-8 issue
>
> Hello
>
> Have you set URIEncoding attribute to UTF-8 in tomcat's server.xml (on connector element)?
>
> Like:
>
> <Connector URIEncoding="UTF-8" connectionTimeout="20000" port="8080"
> protocol="HTTP/1.1" redirectPort="8443"/>
>
> Hope this helps.
>
> Best regards
>
> czinkos
>
>
> 2009/10/24 Glock, Thomas <th...@pfizer.com>:
>>
>> Hoping someone can help -
>>
>> Problem:
>>        Querying for non-english phrases such as Добавить do not return any results under Tomcat but do work when using the Jetty example.
>>
>>        Both tomcat and jetty are being queried by the same custom (flash) client and both reference the same solr/data/index.
>>
>>        I'm using an http POST rather than http GET to do the query to solr.  I believe the problem must be in how tomcat is configured and had hoped the -Dfile.encoding=UTF-8 would solve it - but no luck.  I've stopped started tomcat and deleted the work directory as well.
>>
>>        Results are the same in both IE6 and Firefox and I've used both firebug and fiddler to view the http request/responses.  It is consistent - jetty works, tomcat does not.
>>
>> Environment:
>>        Tomcat 6 as a service on WinXP Professional 2002 sp 2
>>        Tomcat Java properties -
>>
>>        -Dcatalina.home=C:\Program Files\Apache Software
>> Foundation\Tomcat 6.0
>>        -Dcatalina.base=C:\Program Files\Apache Software
>> Foundation\Tomcat 6.0
>>        -Djava.endorsed.dirs=C:\Program Files\Apache Software
>> Foundation\Tomcat 6.0\endorsed
>>        -Djava.io.tmpdir=C:\Program Files\Apache Software
>> Foundation\Tomcat 6.0\temp
>>
>> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
>>        -Djava.util.logging.config.file=C:\Program Files\Apache
>> Software Foundation\Tomcat 6.0\conf\logging.properties
>>        -Dfile.encoding=UTF-8
>>
>> Thanks in advance.
>> Tom Glock
>>
>>
>

RE: Solr under tomcat - UTF-8 issue

Posted by "Glock, Thomas" <th...@pfizer.com>.
Thanks but not working...

I did have the URIEncoding in place and just again moved the URIEncoding attribute to be the first attribute - ensured I saved sever.xml, shut down tomcat, deleted logs and cache and still no luck....  Its probably something very simple and I'm just missing it.

Thanks for your help.


-----Original Message-----
From: Zsolt Czinkos [mailto:czinkos@gmail.com] 
Sent: Saturday, October 24, 2009 11:36 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr under tomcat - UTF-8 issue

Hello

Have you set URIEncoding attribute to UTF-8 in tomcat's server.xml (on connector element)?

Like:

<Connector URIEncoding="UTF-8" connectionTimeout="20000" port="8080"
protocol="HTTP/1.1" redirectPort="8443"/>

Hope this helps.

Best regards

czinkos


2009/10/24 Glock, Thomas <th...@pfizer.com>:
>
> Hoping someone can help -
>
> Problem:
>        Querying for non-english phrases such as Добавить do not return any results under Tomcat but do work when using the Jetty example.
>
>        Both tomcat and jetty are being queried by the same custom (flash) client and both reference the same solr/data/index.
>
>        I'm using an http POST rather than http GET to do the query to solr.  I believe the problem must be in how tomcat is configured and had hoped the -Dfile.encoding=UTF-8 would solve it - but no luck.  I've stopped started tomcat and deleted the work directory as well.
>
>        Results are the same in both IE6 and Firefox and I've used both firebug and fiddler to view the http request/responses.  It is consistent - jetty works, tomcat does not.
>
> Environment:
>        Tomcat 6 as a service on WinXP Professional 2002 sp 2
>        Tomcat Java properties -
>
>        -Dcatalina.home=C:\Program Files\Apache Software 
> Foundation\Tomcat 6.0
>        -Dcatalina.base=C:\Program Files\Apache Software 
> Foundation\Tomcat 6.0
>        -Djava.endorsed.dirs=C:\Program Files\Apache Software 
> Foundation\Tomcat 6.0\endorsed
>        -Djava.io.tmpdir=C:\Program Files\Apache Software 
> Foundation\Tomcat 6.0\temp
>        
> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
>        -Djava.util.logging.config.file=C:\Program Files\Apache 
> Software Foundation\Tomcat 6.0\conf\logging.properties
>        -Dfile.encoding=UTF-8
>
> Thanks in advance.
> Tom Glock
>
>

Re: Solr under tomcat - UTF-8 issue

Posted by Zsolt Czinkos <cz...@gmail.com>.
Hello

Have you set URIEncoding attribute to UTF-8 in tomcat's server.xml (on
connector element)?

Like:

<Connector URIEncoding="UTF-8" connectionTimeout="20000" port="8080"
protocol="HTTP/1.1" redirectPort="8443"/>

Hope this helps.

Best regards

czinkos


2009/10/24 Glock, Thomas <th...@pfizer.com>:
>
> Hoping someone can help -
>
> Problem:
>        Querying for non-english phrases such as Добавить do not return any results under Tomcat but do work when using the Jetty example.
>
>        Both tomcat and jetty are being queried by the same custom (flash) client and both reference the same solr/data/index.
>
>        I'm using an http POST rather than http GET to do the query to solr.  I believe the problem must be in how tomcat is configured and had hoped the -Dfile.encoding=UTF-8 would solve it - but no luck.  I've stopped started tomcat and deleted the work directory as well.
>
>        Results are the same in both IE6 and Firefox and I've used both firebug and fiddler to view the http request/responses.  It is consistent - jetty works, tomcat does not.
>
> Environment:
>        Tomcat 6 as a service on WinXP Professional 2002 sp 2
>        Tomcat Java properties -
>
>        -Dcatalina.home=C:\Program Files\Apache Software Foundation\Tomcat 6.0
>        -Dcatalina.base=C:\Program Files\Apache Software Foundation\Tomcat 6.0
>        -Djava.endorsed.dirs=C:\Program Files\Apache Software Foundation\Tomcat 6.0\endorsed
>        -Djava.io.tmpdir=C:\Program Files\Apache Software Foundation\Tomcat 6.0\temp
>        -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
>        -Djava.util.logging.config.file=C:\Program Files\Apache Software Foundation\Tomcat 6.0\conf\logging.properties
>        -Dfile.encoding=UTF-8
>
> Thanks in advance.
> Tom Glock
>
>