You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Koji Sekiguchi (JIRA)" <ji...@apache.org> on 2007/04/25 14:14:15 UTC
[jira] Created: (SOLR-214) deficit of InputStreamReader support in
anonymous class of ContentStream
deficit of InputStreamReader support in anonymous class of ContentStream
------------------------------------------------------------------------
Key: SOLR-214
URL: https://issues.apache.org/jira/browse/SOLR-214
Project: Solr
Issue Type: Bug
Reporter: Koji Sekiguchi
After SOLR-197 is applied, POSTed Japanese XML contents turn into garbled characters in the index.
I can see the garbled characters through Luke. The issue was never seen before SOLR-197.
The cause of this problem is that the deficit of InputStreamReader support in the anonymous class of ContentStream in SolrRequestParsers.parseParamsAndFillStreams() method.
Before SOLR-197, InputStreamReader was used in XmlUpdateRequestHandler.handleRequestBody() method:
// Cycle through each stream
for( ContentStream stream : req.getContentStreams() ) {
String charset = getCharsetFromContentType( stream.getContentType() );
Reader reader = null;
if( charset == null ) {
reader = new InputStreamReader( stream.getStream() );
}
else {
reader = new InputStreamReader( stream.getStream(), charset );
}
rsp.add( "update", this.update( reader ) );
// Make sure its closed
try { reader.close(); } catch( Exception ex ){}
}
The patch will apply this effect to SolrRequestParsers.
regards,
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-214) deficit of InputStreamReader support
in anonymous class of ContentStream
Posted by "Toru Matsuzawa (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491926 ]
Toru Matsuzawa commented on SOLR-214:
-------------------------------------
This problem can be confirmed with tomcat 5.5.23.
This problem had occurred by "/update" before the correction of SOLR-197.
stream.getReader() is acquired by org.apache.catalina.connector.CoyoteReader.
CoyoteReader use org.apache.catalina.connector.InputBuffer#realReadBytes().
realReadBytes() is read with byte order.
Therefore, garbled characters in the index.
> deficit of InputStreamReader support in anonymous class of ContentStream
> ------------------------------------------------------------------------
>
> Key: SOLR-214
> URL: https://issues.apache.org/jira/browse/SOLR-214
> Project: Solr
> Issue Type: Bug
> Reporter: Koji Sekiguchi
> Attachments: UseInputStreamReader.patch
>
>
> After SOLR-197 is applied, POSTed Japanese XML contents turn into garbled characters in the index.
> I can see the garbled characters through Luke. The issue was never seen before SOLR-197.
> The cause of this problem is that the deficit of InputStreamReader support in the anonymous class of ContentStream in SolrRequestParsers.parseParamsAndFillStreams() method.
> Before SOLR-197, InputStreamReader was used in XmlUpdateRequestHandler.handleRequestBody() method:
> // Cycle through each stream
> for( ContentStream stream : req.getContentStreams() ) {
> String charset = getCharsetFromContentType( stream.getContentType() );
> Reader reader = null;
> if( charset == null ) {
> reader = new InputStreamReader( stream.getStream() );
> }
> else {
> reader = new InputStreamReader( stream.getStream(), charset );
> }
> rsp.add( "update", this.update( reader ) );
>
> // Make sure its closed
> try { reader.close(); } catch( Exception ex ){}
> }
> The patch will apply this effect to SolrRequestParsers.
> regards,
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (SOLR-214) deficit of InputStreamReader support in
anonymous class of ContentStream
Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ryan McKinley reassigned SOLR-214:
----------------------------------
Assignee: Ryan McKinley
> deficit of InputStreamReader support in anonymous class of ContentStream
> ------------------------------------------------------------------------
>
> Key: SOLR-214
> URL: https://issues.apache.org/jira/browse/SOLR-214
> Project: Solr
> Issue Type: Bug
> Reporter: Koji Sekiguchi
> Assigned To: Ryan McKinley
> Attachments: UseInputStreamReader.patch
>
>
> After SOLR-197 is applied, POSTed Japanese XML contents turn into garbled characters in the index.
> I can see the garbled characters through Luke. The issue was never seen before SOLR-197.
> The cause of this problem is that the deficit of InputStreamReader support in the anonymous class of ContentStream in SolrRequestParsers.parseParamsAndFillStreams() method.
> Before SOLR-197, InputStreamReader was used in XmlUpdateRequestHandler.handleRequestBody() method:
> // Cycle through each stream
> for( ContentStream stream : req.getContentStreams() ) {
> String charset = getCharsetFromContentType( stream.getContentType() );
> Reader reader = null;
> if( charset == null ) {
> reader = new InputStreamReader( stream.getStream() );
> }
> else {
> reader = new InputStreamReader( stream.getStream(), charset );
> }
> rsp.add( "update", this.update( reader ) );
>
> // Make sure its closed
> try { reader.close(); } catch( Exception ex ){}
> }
> The patch will apply this effect to SolrRequestParsers.
> regards,
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-214) deficit of InputStreamReader support
in anonymous class of ContentStream
Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491711 ]
Ryan McKinley commented on SOLR-214:
------------------------------------
Weird - the javadocs a pretty explicit that request.getReader() should take care of the character encoding:
http://java.sun.com/javaee/5/docs/api/javax/servlet/ServletRequest.html#getReader()
What app server are you running?
Does this happen when you are using the /update from servlet? (when /update is not mapped in solrconfig.xml)
SolrUpdateServlet.java has always used getReader() .
> deficit of InputStreamReader support in anonymous class of ContentStream
> ------------------------------------------------------------------------
>
> Key: SOLR-214
> URL: https://issues.apache.org/jira/browse/SOLR-214
> Project: Solr
> Issue Type: Bug
> Reporter: Koji Sekiguchi
> Attachments: UseInputStreamReader.patch
>
>
> After SOLR-197 is applied, POSTed Japanese XML contents turn into garbled characters in the index.
> I can see the garbled characters through Luke. The issue was never seen before SOLR-197.
> The cause of this problem is that the deficit of InputStreamReader support in the anonymous class of ContentStream in SolrRequestParsers.parseParamsAndFillStreams() method.
> Before SOLR-197, InputStreamReader was used in XmlUpdateRequestHandler.handleRequestBody() method:
> // Cycle through each stream
> for( ContentStream stream : req.getContentStreams() ) {
> String charset = getCharsetFromContentType( stream.getContentType() );
> Reader reader = null;
> if( charset == null ) {
> reader = new InputStreamReader( stream.getStream() );
> }
> else {
> reader = new InputStreamReader( stream.getStream(), charset );
> }
> rsp.add( "update", this.update( reader ) );
>
> // Make sure its closed
> try { reader.close(); } catch( Exception ex ){}
> }
> The patch will apply this effect to SolrRequestParsers.
> regards,
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Closed: (SOLR-214) deficit of InputStreamReader support in
anonymous class of ContentStream
Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Koji Sekiguchi closed SOLR-214.
-------------------------------
Resolution: Invalid
Close as invalid. The servlet container should take care of character encoding.
> deficit of InputStreamReader support in anonymous class of ContentStream
> ------------------------------------------------------------------------
>
> Key: SOLR-214
> URL: https://issues.apache.org/jira/browse/SOLR-214
> Project: Solr
> Issue Type: Bug
> Reporter: Koji Sekiguchi
> Attachments: UseInputStreamReader.patch
>
>
> After SOLR-197 is applied, POSTed Japanese XML contents turn into garbled characters in the index.
> I can see the garbled characters through Luke. The issue was never seen before SOLR-197.
> The cause of this problem is that the deficit of InputStreamReader support in the anonymous class of ContentStream in SolrRequestParsers.parseParamsAndFillStreams() method.
> Before SOLR-197, InputStreamReader was used in XmlUpdateRequestHandler.handleRequestBody() method:
> // Cycle through each stream
> for( ContentStream stream : req.getContentStreams() ) {
> String charset = getCharsetFromContentType( stream.getContentType() );
> Reader reader = null;
> if( charset == null ) {
> reader = new InputStreamReader( stream.getStream() );
> }
> else {
> reader = new InputStreamReader( stream.getStream(), charset );
> }
> rsp.add( "update", this.update( reader ) );
>
> // Make sure its closed
> try { reader.close(); } catch( Exception ex ){}
> }
> The patch will apply this effect to SolrRequestParsers.
> regards,
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-214) deficit of InputStreamReader support
in anonymous class of ContentStream
Posted by "Ken Krugler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491746 ]
Ken Krugler commented on SOLR-214:
----------------------------------
There's some complex interplay of the content-type in the request, the charset (if any) in the request, and the container being used. So some interesting questions are:
# exactly how the content is being posted (e.g. via the example script?)
# what request header values are being sent along with the post.
# what servlet container (and version) is being used.
> deficit of InputStreamReader support in anonymous class of ContentStream
> ------------------------------------------------------------------------
>
> Key: SOLR-214
> URL: https://issues.apache.org/jira/browse/SOLR-214
> Project: Solr
> Issue Type: Bug
> Reporter: Koji Sekiguchi
> Attachments: UseInputStreamReader.patch
>
>
> After SOLR-197 is applied, POSTed Japanese XML contents turn into garbled characters in the index.
> I can see the garbled characters through Luke. The issue was never seen before SOLR-197.
> The cause of this problem is that the deficit of InputStreamReader support in the anonymous class of ContentStream in SolrRequestParsers.parseParamsAndFillStreams() method.
> Before SOLR-197, InputStreamReader was used in XmlUpdateRequestHandler.handleRequestBody() method:
> // Cycle through each stream
> for( ContentStream stream : req.getContentStreams() ) {
> String charset = getCharsetFromContentType( stream.getContentType() );
> Reader reader = null;
> if( charset == null ) {
> reader = new InputStreamReader( stream.getStream() );
> }
> else {
> reader = new InputStreamReader( stream.getStream(), charset );
> }
> rsp.add( "update", this.update( reader ) );
>
> // Make sure its closed
> try { reader.close(); } catch( Exception ex ){}
> }
> The patch will apply this effect to SolrRequestParsers.
> regards,
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-214) deficit of InputStreamReader support in
anonymous class of ContentStream
Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ryan McKinley resolved SOLR-214.
--------------------------------
Resolution: Fixed
added in rev 536019
> deficit of InputStreamReader support in anonymous class of ContentStream
> ------------------------------------------------------------------------
>
> Key: SOLR-214
> URL: https://issues.apache.org/jira/browse/SOLR-214
> Project: Solr
> Issue Type: Bug
> Reporter: Koji Sekiguchi
> Assigned To: Ryan McKinley
> Attachments: UseInputStreamReader.patch
>
>
> After SOLR-197 is applied, POSTed Japanese XML contents turn into garbled characters in the index.
> I can see the garbled characters through Luke. The issue was never seen before SOLR-197.
> The cause of this problem is that the deficit of InputStreamReader support in the anonymous class of ContentStream in SolrRequestParsers.parseParamsAndFillStreams() method.
> Before SOLR-197, InputStreamReader was used in XmlUpdateRequestHandler.handleRequestBody() method:
> // Cycle through each stream
> for( ContentStream stream : req.getContentStreams() ) {
> String charset = getCharsetFromContentType( stream.getContentType() );
> Reader reader = null;
> if( charset == null ) {
> reader = new InputStreamReader( stream.getStream() );
> }
> else {
> reader = new InputStreamReader( stream.getStream(), charset );
> }
> rsp.add( "update", this.update( reader ) );
>
> // Make sure its closed
> try { reader.close(); } catch( Exception ex ){}
> }
> The patch will apply this effect to SolrRequestParsers.
> regards,
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-214) deficit of InputStreamReader support in
anonymous class of ContentStream
Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Koji Sekiguchi updated SOLR-214:
--------------------------------
Attachment: UseInputStreamReader.patch
The patch attached.
> deficit of InputStreamReader support in anonymous class of ContentStream
> ------------------------------------------------------------------------
>
> Key: SOLR-214
> URL: https://issues.apache.org/jira/browse/SOLR-214
> Project: Solr
> Issue Type: Bug
> Reporter: Koji Sekiguchi
> Attachments: UseInputStreamReader.patch
>
>
> After SOLR-197 is applied, POSTed Japanese XML contents turn into garbled characters in the index.
> I can see the garbled characters through Luke. The issue was never seen before SOLR-197.
> The cause of this problem is that the deficit of InputStreamReader support in the anonymous class of ContentStream in SolrRequestParsers.parseParamsAndFillStreams() method.
> Before SOLR-197, InputStreamReader was used in XmlUpdateRequestHandler.handleRequestBody() method:
> // Cycle through each stream
> for( ContentStream stream : req.getContentStreams() ) {
> String charset = getCharsetFromContentType( stream.getContentType() );
> Reader reader = null;
> if( charset == null ) {
> reader = new InputStreamReader( stream.getStream() );
> }
> else {
> reader = new InputStreamReader( stream.getStream(), charset );
> }
> rsp.add( "update", this.update( reader ) );
>
> // Make sure its closed
> try { reader.close(); } catch( Exception ex ){}
> }
> The patch will apply this effect to SolrRequestParsers.
> regards,
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-214) deficit of InputStreamReader support
in anonymous class of ContentStream
Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491938 ]
Koji Sekiguchi commented on SOLR-214:
-------------------------------------
> Weird - the javadocs a pretty explicit that request.getReader() should take care of the character encoding:
> http://java.sun.com/javaee/5/docs/api/javax/servlet/ServletRequest.html#getReader()
Good point. I simply thought the cause of this problem was the deficit of InputStreamReader support at SOLR-197.
But according to the javadoc, the servlet container should take care of encoding. We are using Tomcat 5.5.23. We should check out the servlet container. Thanks.
> deficit of InputStreamReader support in anonymous class of ContentStream
> ------------------------------------------------------------------------
>
> Key: SOLR-214
> URL: https://issues.apache.org/jira/browse/SOLR-214
> Project: Solr
> Issue Type: Bug
> Reporter: Koji Sekiguchi
> Attachments: UseInputStreamReader.patch
>
>
> After SOLR-197 is applied, POSTed Japanese XML contents turn into garbled characters in the index.
> I can see the garbled characters through Luke. The issue was never seen before SOLR-197.
> The cause of this problem is that the deficit of InputStreamReader support in the anonymous class of ContentStream in SolrRequestParsers.parseParamsAndFillStreams() method.
> Before SOLR-197, InputStreamReader was used in XmlUpdateRequestHandler.handleRequestBody() method:
> // Cycle through each stream
> for( ContentStream stream : req.getContentStreams() ) {
> String charset = getCharsetFromContentType( stream.getContentType() );
> Reader reader = null;
> if( charset == null ) {
> reader = new InputStreamReader( stream.getStream() );
> }
> else {
> reader = new InputStreamReader( stream.getStream(), charset );
> }
> rsp.add( "update", this.update( reader ) );
>
> // Make sure its closed
> try { reader.close(); } catch( Exception ex ){}
> }
> The patch will apply this effect to SolrRequestParsers.
> regards,
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-214) deficit of InputStreamReader support
in anonymous class of ContentStream
Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494151 ]
Koji Sekiguchi commented on SOLR-214:
-------------------------------------
At this moment, to avoid this problem, we are examining to put a servlet filter to work.
But if Solr handles character encoding explicitly, we will be happy with it. We are using Tomcat 5.5.23.
> deficit of InputStreamReader support in anonymous class of ContentStream
> ------------------------------------------------------------------------
>
> Key: SOLR-214
> URL: https://issues.apache.org/jira/browse/SOLR-214
> Project: Solr
> Issue Type: Bug
> Reporter: Koji Sekiguchi
> Attachments: UseInputStreamReader.patch
>
>
> After SOLR-197 is applied, POSTed Japanese XML contents turn into garbled characters in the index.
> I can see the garbled characters through Luke. The issue was never seen before SOLR-197.
> The cause of this problem is that the deficit of InputStreamReader support in the anonymous class of ContentStream in SolrRequestParsers.parseParamsAndFillStreams() method.
> Before SOLR-197, InputStreamReader was used in XmlUpdateRequestHandler.handleRequestBody() method:
> // Cycle through each stream
> for( ContentStream stream : req.getContentStreams() ) {
> String charset = getCharsetFromContentType( stream.getContentType() );
> Reader reader = null;
> if( charset == null ) {
> reader = new InputStreamReader( stream.getStream() );
> }
> else {
> reader = new InputStreamReader( stream.getStream(), charset );
> }
> rsp.add( "update", this.update( reader ) );
>
> // Make sure its closed
> try { reader.close(); } catch( Exception ex ){}
> }
> The patch will apply this effect to SolrRequestParsers.
> regards,
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Reopened: (SOLR-214) deficit of InputStreamReader support in
anonymous class of ContentStream
Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ryan McKinley reopened SOLR-214:
--------------------------------
Without this patch, resin balks at utf-8 input
http://www.nabble.com/UTF-8-problem-with-Resin-tf3704271.html
If resin and tomcat don't handle "getReader()" correctly, maybe we should handle it explicitly
> deficit of InputStreamReader support in anonymous class of ContentStream
> ------------------------------------------------------------------------
>
> Key: SOLR-214
> URL: https://issues.apache.org/jira/browse/SOLR-214
> Project: Solr
> Issue Type: Bug
> Reporter: Koji Sekiguchi
> Attachments: UseInputStreamReader.patch
>
>
> After SOLR-197 is applied, POSTed Japanese XML contents turn into garbled characters in the index.
> I can see the garbled characters through Luke. The issue was never seen before SOLR-197.
> The cause of this problem is that the deficit of InputStreamReader support in the anonymous class of ContentStream in SolrRequestParsers.parseParamsAndFillStreams() method.
> Before SOLR-197, InputStreamReader was used in XmlUpdateRequestHandler.handleRequestBody() method:
> // Cycle through each stream
> for( ContentStream stream : req.getContentStreams() ) {
> String charset = getCharsetFromContentType( stream.getContentType() );
> Reader reader = null;
> if( charset == null ) {
> reader = new InputStreamReader( stream.getStream() );
> }
> else {
> reader = new InputStreamReader( stream.getStream(), charset );
> }
> rsp.add( "update", this.update( reader ) );
>
> // Make sure its closed
> try { reader.close(); } catch( Exception ex ){}
> }
> The patch will apply this effect to SolrRequestParsers.
> regards,
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.