You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Lance Norskog (JIRA)" <ji...@apache.org> on 2007/11/14 20:59:43 UTC

[jira] Created: (SOLR-412) XsltWriter does not output UTF-8 by default

XsltWriter does not output UTF-8 by default
-------------------------------------------

                 Key: SOLR-412
                 URL: https://issues.apache.org/jira/browse/SOLR-412
             Project: Solr
          Issue Type: Bug
          Components: search
    Affects Versions: 1.2
         Environment: Tomcat 5.5
Linux Red Hat ES4  (2.6.9-5.ELsmp from 'uname -a')
            Reporter: Lance Norskog


XsltWriter outputs XML text in ISO-8859-1 encoding by default.

Tomcat 5.5 has URIEncoding="UTF-8" set in the <Connector> element as described in the Wiki.

This outout description in the XML: 

<xsl:output method="xml" encoding="utf-8" />

gives output with this header:

HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: text/xml;charset=ISO-8859-1
Transfer-Encoding: chunked
Date: Wed, 14 Nov 2007 17:49:11 GMT

I had to change the <xsl:output> directive to this:

 <xsl:output media-type="text/xml; charset=UTF-8" encoding="UTF-8"/>

This is the root cause of SOLR-233.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-412) XsltWriter does not output UTF-8 by default

Posted by "Age Jan Kuperus (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773501#action_12773501 ] 

Age Jan Kuperus edited comment on SOLR-412 at 11/4/09 2:54 PM:
---------------------------------------------------------------

IMHO the documentation in xslt 1.0 (http://www.w3.org/TR/xslt#output) is a bit clearer on the usage of these fields:

"The method attribute on xsl:output identifies the overall method that should be used for outputting the result tree. The value must be a QName. If the QName does not have a prefix, then it identifies a method specified in this document and must be one of xml, html or text."

"encoding specifies the preferred character encoding that the XSLT processor should use to encode sequences of characters as sequences of bytes; the value of the attribute should be treated case-insensitively; the value must contain only characters in the range #x21 to #x7E (i.e. printable ASCII characters); the value should either be a charset registered with the Internet Assigned Numbers Authority [IANA], [RFC2278] or start with X-"

"media-type specifies the media type (MIME content type) of the data that results from outputting the result tree; the charset parameter should not be specified explicitly; instead, when the top-level media type is text, a charset parameter should be added according to the character encoding actually used by the output method"

If I understand this correctly, this means the correct output specification is <xsl:output method="xml" encoding="utf-8" />, and <xsl:output media-type="text/xml; charset=UTF-8"/> should never be used. 

My suggestion would be to change XSLTResponseWriter.getContentType() in such a way that (in pseudocode):
if encoding is null
..  encoding = "utf-8"
end if
if  media-type is not null
..  /* next if is for compatibility with the workaround only */
..  if media-type contains "charset='
....    return media-type
..  else
....    return media-type + "; charset=\"" + encoding
..  end if
else
..  if method is "html" or the first element in the final output is <html>
....    media-type = "text/html"
..  elseif method is "text"
....    media-type = "text/plain"
..  else /* it must be xml */
....    media-type = "text/xml"
..  end if
..  return media-type + "; charset=\"" + encoding
end if

      was (Author: agejan`):
    IMHO the documentation in xslt 1.0 (http://www.w3.org/TR/xslt#output) is a bit clearer on the usage of these fields:

"The method attribute on xsl:output identifies the overall method that should be used for outputting the result tree. The value must be a QName. If the QName does not have a prefix, then it identifies a method specified in this document and must be one of xml, html or text."

"encoding specifies the preferred character encoding that the XSLT processor should use to encode sequences of characters as sequences of bytes; the value of the attribute should be treated case-insensitively; the value must contain only characters in the range #x21 to #x7E (i.e. printable ASCII characters); the value should either be a charset registered with the Internet Assigned Numbers Authority [IANA], [RFC2278] or start with X-"

"media-type specifies the media type (MIME content type) of the data that results from outputting the result tree; the charset parameter should not be specified explicitly; instead, when the top-level media type is text, a charset parameter should be added according to the character encoding actually used by the output method"

If I understand this correctly, this means the correct output specification is <xsl:output method="xml" encoding="utf-8" />, and <xsl:output media-type="text/xml; charset=UTF-8"/> should never be used. 

My suggestion would be to change XSLTResponseWriter.getContentType() in such a way that (in pseudocode):
if encoding is null
  encoding = "utf-8"
end if
if  media-type is not null
  /* next if is for compatibility with the workaround only */
  if media-type contains "charset='
    return media-type
  else
      return media-type + "; charset=\"" + encoding
  end if
else
  if method is "html" or the first element in the final output is <html>
    media-type = "text/html"
  elseif method is "text"
    media-type = "text/plain"
  else /* it must be xml */
    media-type = "text/xml"
  end if
  return media-type + "; charset=\"" + encoding
end if
  
> XsltWriter does not output UTF-8 by default
> -------------------------------------------
>
>                 Key: SOLR-412
>                 URL: https://issues.apache.org/jira/browse/SOLR-412
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 1.2
>         Environment: Tomcat 5.5
> Linux Red Hat ES4  (2.6.9-5.ELsmp from 'uname -a')
>            Reporter: Lance Norskog
>
> XsltWriter outputs XML text in ISO-8859-1 encoding by default.
> Tomcat 5.5 has URIEncoding="UTF-8" set in the <Connector> element as described in the Wiki.
> This outout description in the XML: 
> <xsl:output method="xml" encoding="utf-8" />
> gives output with this header:
> HTTP/1.1 200 OK
> Server: Apache-Coyote/1.1
> Content-Type: text/xml;charset=ISO-8859-1
> Transfer-Encoding: chunked
> Date: Wed, 14 Nov 2007 17:49:11 GMT
> I had to change the <xsl:output> directive to this:
>  <xsl:output media-type="text/xml; charset=UTF-8" encoding="UTF-8"/>
> This is the root cause of SOLR-233.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-412) XsltWriter does not output UTF-8 by default

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543249 ] 

Lance Norskog commented on SOLR-412:
------------------------------------

I am not an XSL expert. From what I can tell, the XSLT
documentation says that this:
    <xsl:output method="xml" encoding="utf-8" />
    <xsl:output media-type="text/xml; charset=UTF-8"
are equivalent. It seems like both should create XML
encoded in UTF-8, and should should create the same
Content-type header line. My bug report is that the
media-type form works, but that the method="xml" form
does not.

I would not be surprised to learn that the
method="xml" form does not do what it looks like; at
this point I have no respect for the XSLT language. 
Thank you for your time and attention to my humble
complaint.

Lance

--- "Hoss Man (JIRA)" <ji...@apache.org> wrote:

https://issues.apache.org/jira/browse/SOLR-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542609



      ____________________________________________________________________________________
Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs


> XsltWriter does not output UTF-8 by default
> -------------------------------------------
>
>                 Key: SOLR-412
>                 URL: https://issues.apache.org/jira/browse/SOLR-412
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 1.2
>         Environment: Tomcat 5.5
> Linux Red Hat ES4  (2.6.9-5.ELsmp from 'uname -a')
>            Reporter: Lance Norskog
>
> XsltWriter outputs XML text in ISO-8859-1 encoding by default.
> Tomcat 5.5 has URIEncoding="UTF-8" set in the <Connector> element as described in the Wiki.
> This outout description in the XML: 
> <xsl:output method="xml" encoding="utf-8" />
> gives output with this header:
> HTTP/1.1 200 OK
> Server: Apache-Coyote/1.1
> Content-Type: text/xml;charset=ISO-8859-1
> Transfer-Encoding: chunked
> Date: Wed, 14 Nov 2007 17:49:11 GMT
> I had to change the <xsl:output> directive to this:
>  <xsl:output media-type="text/xml; charset=UTF-8" encoding="UTF-8"/>
> This is the root cause of SOLR-233.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-412) XsltWriter does not output UTF-8 by default

Posted by "Age Jan Kuperus (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Age Jan Kuperus updated SOLR-412:
---------------------------------

    Attachment: diff-2009-10-22

Attached a patch against the 2009-10-22 daily tgz as we implemented it, which correctly handles all legal situations we tried, including the defaults. 

This patch does not explicitly handle two corner cases (this is documented in the patch), which could lead to less expected results (I can't test that here):

1) html documents without explicit <xsl:output method="html" .../> will be treated as xml. IMHO this situation should never occur as it is bad XSLT programming behaviour.

2) the (IMHO incorrect) previous solution (<xsl:output media-type="...; charset=... encoding=.../>) will result in a double charset definition. Although that is incorrect, it is accepted without error by Firefox and possibly by all browsers (I did not test that) . As stated before, it should not be done that way.

> XsltWriter does not output UTF-8 by default
> -------------------------------------------
>
>                 Key: SOLR-412
>                 URL: https://issues.apache.org/jira/browse/SOLR-412
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 1.2
>         Environment: Tomcat 5.5
> Linux Red Hat ES4  (2.6.9-5.ELsmp from 'uname -a')
>            Reporter: Lance Norskog
>         Attachments: diff-2009-10-22
>
>
> XsltWriter outputs XML text in ISO-8859-1 encoding by default.
> Tomcat 5.5 has URIEncoding="UTF-8" set in the <Connector> element as described in the Wiki.
> This outout description in the XML: 
> <xsl:output method="xml" encoding="utf-8" />
> gives output with this header:
> HTTP/1.1 200 OK
> Server: Apache-Coyote/1.1
> Content-Type: text/xml;charset=ISO-8859-1
> Transfer-Encoding: chunked
> Date: Wed, 14 Nov 2007 17:49:11 GMT
> I had to change the <xsl:output> directive to this:
>  <xsl:output media-type="text/xml; charset=UTF-8" encoding="UTF-8"/>
> This is the root cause of SOLR-233.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-412) XsltWriter does not output UTF-8 by default

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779180#action_12779180 ] 

Hoss Man commented on SOLR-412:
-------------------------------

bq. IMHO the documentation in xslt 1.0 (http://www.w3.org/TR/xslt#output) is a bit clearer on the usage of these fields

I'm not sure if looking at an *older* specification proposal is really the right way to go here.  Shouldn't the fact that all of that language was removed from the XSLT 2.0 spec suggest that it was changed for a reason?

> XsltWriter does not output UTF-8 by default
> -------------------------------------------
>
>                 Key: SOLR-412
>                 URL: https://issues.apache.org/jira/browse/SOLR-412
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 1.2
>         Environment: Tomcat 5.5
> Linux Red Hat ES4  (2.6.9-5.ELsmp from 'uname -a')
>            Reporter: Lance Norskog
>         Attachments: diff-2009-10-22
>
>
> XsltWriter outputs XML text in ISO-8859-1 encoding by default.
> Tomcat 5.5 has URIEncoding="UTF-8" set in the <Connector> element as described in the Wiki.
> This outout description in the XML: 
> <xsl:output method="xml" encoding="utf-8" />
> gives output with this header:
> HTTP/1.1 200 OK
> Server: Apache-Coyote/1.1
> Content-Type: text/xml;charset=ISO-8859-1
> Transfer-Encoding: chunked
> Date: Wed, 14 Nov 2007 17:49:11 GMT
> I had to change the <xsl:output> directive to this:
>  <xsl:output media-type="text/xml; charset=UTF-8" encoding="UTF-8"/>
> This is the root cause of SOLR-233.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-412) XsltWriter does not output UTF-8 by default

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782701#action_12782701 ] 

Hoss Man commented on SOLR-412:
-------------------------------

Ok, i've become comvinced that we should do something like the psuedo-code Age posted above ... not so much by the additional xslt-query-serialization refrnece, but by thinking through the practical use cases...

* If a template specifies a charset in it's media-type property it doesnt' change anything for those people
* If people have media-types w/o charset's but they do declare an encoding then we're matching their wishes as best we can, and if they don't like it they can add a charset to the media-type

Age: I haven't looked carefully at your patch, but if we can fix the double charset problem you described (which should be easy with a simple substring test) then i'm +1 for making this change.

> XsltWriter does not output UTF-8 by default
> -------------------------------------------
>
>                 Key: SOLR-412
>                 URL: https://issues.apache.org/jira/browse/SOLR-412
>             Project: Solr
>          Issue Type: Bug
>          Components: Response Writers
>    Affects Versions: 1.2
>         Environment: Tomcat 5.5
> Linux Red Hat ES4  (2.6.9-5.ELsmp from 'uname -a')
>            Reporter: Lance Norskog
>         Attachments: diff-2009-10-22
>
>
> XsltWriter outputs XML text in ISO-8859-1 encoding by default.
> Tomcat 5.5 has URIEncoding="UTF-8" set in the <Connector> element as described in the Wiki.
> This outout description in the XML: 
> <xsl:output method="xml" encoding="utf-8" />
> gives output with this header:
> HTTP/1.1 200 OK
> Server: Apache-Coyote/1.1
> Content-Type: text/xml;charset=ISO-8859-1
> Transfer-Encoding: chunked
> Date: Wed, 14 Nov 2007 17:49:11 GMT
> I had to change the <xsl:output> directive to this:
>  <xsl:output media-type="text/xml; charset=UTF-8" encoding="UTF-8"/>
> This is the root cause of SOLR-233.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-412) XsltWriter does not output UTF-8 by default

Posted by "Age Jan Kuperus (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780284#action_12780284 ] 

Age Jan Kuperus commented on SOLR-412:
--------------------------------------

I agree. Although I was pretty sure XSLT 2.0 was even stricter but could not immediately find a formal reference.
So I did some more research today and found the following confirmation in http://www.w3.org/TR/xslt-xquery-serialization/, which is part of XSLT 2.0:

"media-type 	A string of Unicode characters specifying the media type (MIME content type) [RFC2046]; the charset parameter of the media type MUST NOT be specified explicitly in the value of the media-type parameter".

Therefore I would like you to have a look at my patch and comment on it (or even commit it ;-). Committing this patch would also require the patches for SOLR-233 and SOLR-514  to be undone (as their results are illegal in both XSLT 1.0 and 2.0), and possibly has documentation consequences.

> XsltWriter does not output UTF-8 by default
> -------------------------------------------
>
>                 Key: SOLR-412
>                 URL: https://issues.apache.org/jira/browse/SOLR-412
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 1.2
>         Environment: Tomcat 5.5
> Linux Red Hat ES4  (2.6.9-5.ELsmp from 'uname -a')
>            Reporter: Lance Norskog
>         Attachments: diff-2009-10-22
>
>
> XsltWriter outputs XML text in ISO-8859-1 encoding by default.
> Tomcat 5.5 has URIEncoding="UTF-8" set in the <Connector> element as described in the Wiki.
> This outout description in the XML: 
> <xsl:output method="xml" encoding="utf-8" />
> gives output with this header:
> HTTP/1.1 200 OK
> Server: Apache-Coyote/1.1
> Content-Type: text/xml;charset=ISO-8859-1
> Transfer-Encoding: chunked
> Date: Wed, 14 Nov 2007 17:49:11 GMT
> I had to change the <xsl:output> directive to this:
>  <xsl:output media-type="text/xml; charset=UTF-8" encoding="UTF-8"/>
> This is the root cause of SOLR-233.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-412) XsltWriter does not output UTF-8 by default

Posted by "Age Jan Kuperus (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773501#action_12773501 ] 

Age Jan Kuperus commented on SOLR-412:
--------------------------------------

IMHO the documentation in xslt 1.0 (http://www.w3.org/TR/xslt#output) is a bit clearer on the usage of these fields:

"The method attribute on xsl:output identifies the overall method that should be used for outputting the result tree. The value must be a QName. If the QName does not have a prefix, then it identifies a method specified in this document and must be one of xml, html or text."

"encoding specifies the preferred character encoding that the XSLT processor should use to encode sequences of characters as sequences of bytes; the value of the attribute should be treated case-insensitively; the value must contain only characters in the range #x21 to #x7E (i.e. printable ASCII characters); the value should either be a charset registered with the Internet Assigned Numbers Authority [IANA], [RFC2278] or start with X-"

"media-type specifies the media type (MIME content type) of the data that results from outputting the result tree; the charset parameter should not be specified explicitly; instead, when the top-level media type is text, a charset parameter should be added according to the character encoding actually used by the output method"

If I understand this correctly, this means the correct output specification is <xsl:output method="xml" encoding="utf-8" />, and <xsl:output media-type="text/xml; charset=UTF-8"/> should never be used. 

My suggestion would be to change XSLTResponseWriter.getContentType() in such a way that (in pseudocode):
if encoding is null
  encoding = "utf-8"
end if
if  media-type is not null
  /* next if is for compatibility with the workaround only */
  if media-type contains "charset='
    return media-type
  else
      return media-type + "; charset=\"" + encoding
  end if
else
  if method is "html" or the first element in the final output is <html>
    media-type = "text/html"
  elseif method is "text"
    media-type = "text/plain"
  else /* it must be xml */
    media-type = "text/xml"
  end if
  return media-type + "; charset=\"" + encoding
end if

> XsltWriter does not output UTF-8 by default
> -------------------------------------------
>
>                 Key: SOLR-412
>                 URL: https://issues.apache.org/jira/browse/SOLR-412
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 1.2
>         Environment: Tomcat 5.5
> Linux Red Hat ES4  (2.6.9-5.ELsmp from 'uname -a')
>            Reporter: Lance Norskog
>
> XsltWriter outputs XML text in ISO-8859-1 encoding by default.
> Tomcat 5.5 has URIEncoding="UTF-8" set in the <Connector> element as described in the Wiki.
> This outout description in the XML: 
> <xsl:output method="xml" encoding="utf-8" />
> gives output with this header:
> HTTP/1.1 200 OK
> Server: Apache-Coyote/1.1
> Content-Type: text/xml;charset=ISO-8859-1
> Transfer-Encoding: chunked
> Date: Wed, 14 Nov 2007 17:49:11 GMT
> I had to change the <xsl:output> directive to this:
>  <xsl:output media-type="text/xml; charset=UTF-8" encoding="UTF-8"/>
> This is the root cause of SOLR-233.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-412) XsltWriter does not output UTF-8 by default

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542609 ] 

Hoss Man commented on SOLR-412:
-------------------------------

i'm confused as to what the fix here would be... what do you think Solr should do instead of the current behavior?  the XSLTResponseWriter takes the media-type and uses it as the Content-Type ... Tomcat decides that since the Content-Type doesn't have a charset, it will add one (it's default, which i'm assuming can be configured in the tomcat configs)

...what would you suggest as an improvement?

(i agree UTF-8 should be the Solr default as much as possible ... but the point of the XSLTResponseWriter is to give the xslt creator total control over the content-type ... doing anything that might circumvent their intentions seems like a pad idea).


> XsltWriter does not output UTF-8 by default
> -------------------------------------------
>
>                 Key: SOLR-412
>                 URL: https://issues.apache.org/jira/browse/SOLR-412
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 1.2
>         Environment: Tomcat 5.5
> Linux Red Hat ES4  (2.6.9-5.ELsmp from 'uname -a')
>            Reporter: Lance Norskog
>
> XsltWriter outputs XML text in ISO-8859-1 encoding by default.
> Tomcat 5.5 has URIEncoding="UTF-8" set in the <Connector> element as described in the Wiki.
> This outout description in the XML: 
> <xsl:output method="xml" encoding="utf-8" />
> gives output with this header:
> HTTP/1.1 200 OK
> Server: Apache-Coyote/1.1
> Content-Type: text/xml;charset=ISO-8859-1
> Transfer-Encoding: chunked
> Date: Wed, 14 Nov 2007 17:49:11 GMT
> I had to change the <xsl:output> directive to this:
>  <xsl:output media-type="text/xml; charset=UTF-8" encoding="UTF-8"/>
> This is the root cause of SOLR-233.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-412) XsltWriter does not output UTF-8 by default

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man updated SOLR-412:
--------------------------

    Component/s:     (was: search)
                 Response Writers

> XsltWriter does not output UTF-8 by default
> -------------------------------------------
>
>                 Key: SOLR-412
>                 URL: https://issues.apache.org/jira/browse/SOLR-412
>             Project: Solr
>          Issue Type: Bug
>          Components: Response Writers
>    Affects Versions: 1.2
>         Environment: Tomcat 5.5
> Linux Red Hat ES4  (2.6.9-5.ELsmp from 'uname -a')
>            Reporter: Lance Norskog
>         Attachments: diff-2009-10-22
>
>
> XsltWriter outputs XML text in ISO-8859-1 encoding by default.
> Tomcat 5.5 has URIEncoding="UTF-8" set in the <Connector> element as described in the Wiki.
> This outout description in the XML: 
> <xsl:output method="xml" encoding="utf-8" />
> gives output with this header:
> HTTP/1.1 200 OK
> Server: Apache-Coyote/1.1
> Content-Type: text/xml;charset=ISO-8859-1
> Transfer-Encoding: chunked
> Date: Wed, 14 Nov 2007 17:49:11 GMT
> I had to change the <xsl:output> directive to this:
>  <xsl:output media-type="text/xml; charset=UTF-8" encoding="UTF-8"/>
> This is the root cause of SOLR-233.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-412) XsltWriter does not output UTF-8 by default

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12545160 ] 

Hoss Man commented on SOLR-412:
-------------------------------

based on my reading of: http://www.w3.org/TR/xslt20/#element-output

the "method" attribute exists solely to instruct the transformer how to generate the output ... it appears to exist largely to support hacks for html but also to support plain text output.

"encoding" dictates the actual character encoding used in the output stream.

"media-type" is ... the media-type, which if unspecified defaults to either "text/xml" if method="xml" or "text/html" or "text/plain" for the corrisponding methods ... but the default media-type does not ever seem to be influenced by the "encoding" attribute.


I'm not convinced there isn't *something* Solr can do to handle this situation better, i just don't know what it is.

> XsltWriter does not output UTF-8 by default
> -------------------------------------------
>
>                 Key: SOLR-412
>                 URL: https://issues.apache.org/jira/browse/SOLR-412
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 1.2
>         Environment: Tomcat 5.5
> Linux Red Hat ES4  (2.6.9-5.ELsmp from 'uname -a')
>            Reporter: Lance Norskog
>
> XsltWriter outputs XML text in ISO-8859-1 encoding by default.
> Tomcat 5.5 has URIEncoding="UTF-8" set in the <Connector> element as described in the Wiki.
> This outout description in the XML: 
> <xsl:output method="xml" encoding="utf-8" />
> gives output with this header:
> HTTP/1.1 200 OK
> Server: Apache-Coyote/1.1
> Content-Type: text/xml;charset=ISO-8859-1
> Transfer-Encoding: chunked
> Date: Wed, 14 Nov 2007 17:49:11 GMT
> I had to change the <xsl:output> directive to this:
>  <xsl:output media-type="text/xml; charset=UTF-8" encoding="UTF-8"/>
> This is the root cause of SOLR-233.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.