You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@tomee.apache.org by using namespace <fo...@bk.ru> on 2015/06/25 15:21:58 UTC

Encoding issue

I have deployed a web-service on TomEE 1.7.1 and currently having encoding
problem when I work with request xml data. The web-service implements one
method, which receives and xml data inside a SOAP message like following:

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:soap="http://tempuri.org/soaprequest">
   <soapenv:Header/>
   <soapenv:Body>
      <soap:soaprequest>
         <soap:streams>
            <soap:soapin contentType="?">
               <soap:Value>
                  

                     <tag_a>cyrillic text here...</tag_a>
                  

               </soap:Value>
            </soap:soapin>
         </soap:streams>
      </soap:soaprequest>
   </soapenv:Body>
</soapenv:Envelope>

Inside the web-service implementation class I retrieve everything from 
 tag and cast it to String:

			Element soapinElement = (Element)
streams.getSoapin().getValue().getAny();			
			Node node = (Node) soapinElement;
			Document document = node.getOwnerDocument();
			DOMImplementationLS domImplLS = (DOMImplementationLS)        
document.getImplementation();			
			LSSerializer serializer = domImplLS.createLSSerializer();
			LSOutput output = domImplLS.createLSOutput();
			output.setEncoding("UTF-8");
			Writer stringWriter = new StringWriter();
			output.setCharacterStream(stringWriter);
			serializer.write(document, output);
			String soapinString = stringWriter.toString();

And then I put soapinString into Oracle database CLOB column.

Everything is great when SOAP message is encoded in UTF-8, but I get
unreadable characters when SOAP message has different encoding, like CP1251
and what I see in Oracle as a result is:

                  

                     <tag_a>РћР’Р” Р’РћР</tag_a>
                  


I tried encoding conversion like this:

			Element soapinElement = (Element)
streams.getSoapin().getValue().getAny();			
			Node node = (Node) soapinElement;
			Document document = node.getOwnerDocument();
			DOMImplementationLS domImplLS = (DOMImplementationLS)
document.getImplementation();			
			LSSerializer serializer = domImplLS.createLSSerializer();
			LSOutput output = domImplLS.createLSOutput();
			ByteArrayOutputStream byteArrayOutputStream = new
ByteArrayOutputStream();
			output.setByteStream(byteArrayOutputStream);
			byte[] result = byteArrayOutputStream.toByteArray();
			InputStream is = new ByteArrayInputStream(result);
			Reader reader = new InputStreamReader(is, "windows-1251");
			OutputStream out = new ByteArrayOutputStream();
			Writer writer = new OutputStreamWriter(out, "UTF-8");
			writer.write("\uFEFF");	
            char[] buffer = new char[10];
            int read;
            while ((read = reader.read(buffer)) != -1) {
                writer.write(buffer, 0, read);
            }			
            reader.close();
            writer.close();
            serializer.write((Node) out, output);
            String soapinString = output.toString();

But it produces something that looks like byte code.
I would like to ask for some suggestions on possible ways to resolve
encoding conversion to UTF-8.



--
View this message in context: http://tomee-openejb.979440.n4.nabble.com/Encoding-issue-tp4675408.html
Sent from the TomEE Users mailing list archive at Nabble.com.

Re: Encoding issue

Posted by Romain Manni-Bucau <rm...@gmail.com>.

well, while HTTP spec (not any java spec) considers UTF8 cant be a default
I guess well be there...


Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> |  Blog
<http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> |
LinkedIn <https://www.linkedin.com/in/rmannibucau> | Tomitriber
<http://www.tomitribe.com>

2015-06-29 10:09 GMT-07:00 Jean-Louis Monteiro <jl...@tomitribe.com>:

> I've been creating such a trick for year, can't believe we are still here
> today.
> Not a useful comment but this thread made me react again.
>
> --
> Jean-Louis Monteiro
> http://twitter.com/jlouismonteiro
> http://www.tomitribe.com
>
> On Sat, Jun 27, 2015 at 4:35 PM, Romain Manni-Bucau <rmannibucau@gmail.com
> >
> wrote:
>
> > Hi
> >
> > sounds normal (http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q1),
> > maybe add a filter setting the encoding (request.setCharacterEncoding(
> > "UTF-8");)
> >
> > Le 27 juin 2015 12:17, "using namespace" <fo...@bk.ru> a écrit :
> >
> > > I have deployed a web-service on TomEE 1.7.1 and currently having
> > encoding
> > > problem when I work with request xml data. The web-service implements
> one
> > > method, which receives and xml data inside a SOAP message like
> following:
> > >
> > > <soapenv:Envelope xmlns:soapenv="
> > http://schemas.xmlsoap.org/soap/envelope/
> > > "
> > > xmlns:soap="http://tempuri.org/soaprequest">
> > >    <soapenv:Header/>
> > >    <soapenv:Body>
> > >       <soap:soaprequest>
> > >          <soap:streams>
> > >             <soap:soapin contentType="?">
> > >                <soap:Value>
> > >
> > >
> > >                      <tag_a>cyrillic text here...</tag_a>
> > >
> > >
> > >                </soap:Value>
> > >             </soap:soapin>
> > >          </soap:streams>
> > >       </soap:soaprequest>
> > >    </soapenv:Body>
> > > </soapenv:Envelope>
> > >
> > > Inside the web-service implementation class I retrieve everything from
> > >  tag and cast it to String:
> > >
> > >                         Element soapinElement = (Element)
> > > streams.getSoapin().getValue().getAny();
> > >                         Node node = (Node) soapinElement;
> > >                         Document document = node.getOwnerDocument();
> > >                         DOMImplementationLS domImplLS =
> > > (DOMImplementationLS)
> > > document.getImplementation();
> > >                         LSSerializer serializer =
> > > domImplLS.createLSSerializer();
> > >                         LSOutput output = domImplLS.createLSOutput();
> > >                         output.setEncoding("UTF-8");
> > >                         Writer stringWriter = new StringWriter();
> > >                         output.setCharacterStream(stringWriter);
> > >                         serializer.write(document, output);
> > >                         String soapinString = stringWriter.toString();
> > >
> > > And then I put soapinString into Oracle database CLOB column.
> > >
> > > Everything is great when SOAP message is encoded in UTF-8, but I get
> > > unreadable characters when SOAP message has different encoding, like
> > CP1251
> > > and what I see in Oracle as a result is:
> > >
> > >
> > >
> > >                      <tag_a>РћР’Р” Р’РћР</tag_a>
> > >
> > >
> > >
> > > I tried encoding conversion like this:
> > >
> > >                         Element soapinElement = (Element)
> > > streams.getSoapin().getValue().getAny();
> > >                         Node node = (Node) soapinElement;
> > >                         Document document = node.getOwnerDocument();
> > >                         DOMImplementationLS domImplLS =
> > > (DOMImplementationLS)
> > > document.getImplementation();
> > >                         LSSerializer serializer =
> > > domImplLS.createLSSerializer();
> > >                         LSOutput output = domImplLS.createLSOutput();
> > >                         ByteArrayOutputStream byteArrayOutputStream =
> new
> > > ByteArrayOutputStream();
> > >                         output.setByteStream(byteArrayOutputStream);
> > >                         byte[] result =
> > > byteArrayOutputStream.toByteArray();
> > >                         InputStream is = new
> > ByteArrayInputStream(result);
> > >                         Reader reader = new InputStreamReader(is,
> > > "windows-1251");
> > >                         OutputStream out = new ByteArrayOutputStream();
> > >                         Writer writer = new OutputStreamWriter(out,
> > > "UTF-8");
> > >                         writer.write("\uFEFF");
> > >             char[] buffer = new char[10];
> > >             int read;
> > >             while ((read = reader.read(buffer)) != -1) {
> > >                 writer.write(buffer, 0, read);
> > >             }
> > >             reader.close();
> > >             writer.close();
> > >             serializer.write((Node) out, output);
> > >             String soapinString = output.toString();
> > >
> > > But it produces something that looks like byte code.
> > > I would like to ask for some suggestions on possible ways to resolve
> > > encoding conversion to UTF-8.
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > >
> http://tomee-openejb.979440.n4.nabble.com/Encoding-issue-tp4675408.html
> > > Sent from the TomEE Users mailing list archive at Nabble.com.
> > >
> >
>

Re: Encoding issue

Posted by Jean-Louis Monteiro <jl...@tomitribe.com>.

I've been creating such a trick for year, can't believe we are still here
today.
Not a useful comment but this thread made me react again.

--
Jean-Louis Monteiro
http://twitter.com/jlouismonteiro
http://www.tomitribe.com

On Sat, Jun 27, 2015 at 4:35 PM, Romain Manni-Bucau <rm...@gmail.com>
wrote:

> Hi
>
> sounds normal (http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q1),
> maybe add a filter setting the encoding (request.setCharacterEncoding(
> "UTF-8");)
>
> Le 27 juin 2015 12:17, "using namespace" <fo...@bk.ru> a écrit :
>
> > I have deployed a web-service on TomEE 1.7.1 and currently having
> encoding
> > problem when I work with request xml data. The web-service implements one
> > method, which receives and xml data inside a SOAP message like following:
> >
> > <soapenv:Envelope xmlns:soapenv="
> http://schemas.xmlsoap.org/soap/envelope/
> > "
> > xmlns:soap="http://tempuri.org/soaprequest">
> >    <soapenv:Header/>
> >    <soapenv:Body>
> >       <soap:soaprequest>
> >          <soap:streams>
> >             <soap:soapin contentType="?">
> >                <soap:Value>
> >
> >
> >                      <tag_a>cyrillic text here...</tag_a>
> >
> >
> >                </soap:Value>
> >             </soap:soapin>
> >          </soap:streams>
> >       </soap:soaprequest>
> >    </soapenv:Body>
> > </soapenv:Envelope>
> >
> > Inside the web-service implementation class I retrieve everything from
> >  tag and cast it to String:
> >
> >                         Element soapinElement = (Element)
> > streams.getSoapin().getValue().getAny();
> >                         Node node = (Node) soapinElement;
> >                         Document document = node.getOwnerDocument();
> >                         DOMImplementationLS domImplLS =
> > (DOMImplementationLS)
> > document.getImplementation();
> >                         LSSerializer serializer =
> > domImplLS.createLSSerializer();
> >                         LSOutput output = domImplLS.createLSOutput();
> >                         output.setEncoding("UTF-8");
> >                         Writer stringWriter = new StringWriter();
> >                         output.setCharacterStream(stringWriter);
> >                         serializer.write(document, output);
> >                         String soapinString = stringWriter.toString();
> >
> > And then I put soapinString into Oracle database CLOB column.
> >
> > Everything is great when SOAP message is encoded in UTF-8, but I get
> > unreadable characters when SOAP message has different encoding, like
> CP1251
> > and what I see in Oracle as a result is:
> >
> >
> >
> >                      <tag_a>РћР’Р” Р’РћР</tag_a>
> >
> >
> >
> > I tried encoding conversion like this:
> >
> >                         Element soapinElement = (Element)
> > streams.getSoapin().getValue().getAny();
> >                         Node node = (Node) soapinElement;
> >                         Document document = node.getOwnerDocument();
> >                         DOMImplementationLS domImplLS =
> > (DOMImplementationLS)
> > document.getImplementation();
> >                         LSSerializer serializer =
> > domImplLS.createLSSerializer();
> >                         LSOutput output = domImplLS.createLSOutput();
> >                         ByteArrayOutputStream byteArrayOutputStream = new
> > ByteArrayOutputStream();
> >                         output.setByteStream(byteArrayOutputStream);
> >                         byte[] result =
> > byteArrayOutputStream.toByteArray();
> >                         InputStream is = new
> ByteArrayInputStream(result);
> >                         Reader reader = new InputStreamReader(is,
> > "windows-1251");
> >                         OutputStream out = new ByteArrayOutputStream();
> >                         Writer writer = new OutputStreamWriter(out,
> > "UTF-8");
> >                         writer.write("\uFEFF");
> >             char[] buffer = new char[10];
> >             int read;
> >             while ((read = reader.read(buffer)) != -1) {
> >                 writer.write(buffer, 0, read);
> >             }
> >             reader.close();
> >             writer.close();
> >             serializer.write((Node) out, output);
> >             String soapinString = output.toString();
> >
> > But it produces something that looks like byte code.
> > I would like to ask for some suggestions on possible ways to resolve
> > encoding conversion to UTF-8.
> >
> >
> >
> > --
> > View this message in context:
> > http://tomee-openejb.979440.n4.nabble.com/Encoding-issue-tp4675408.html
> > Sent from the TomEE Users mailing list archive at Nabble.com.
> >
>

Re: Encoding issue

Posted by Romain Manni-Bucau <rm...@gmail.com>.

Hi

sounds normal (http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q1),
maybe add a filter setting the encoding (request.setCharacterEncoding(
"UTF-8");)

Le 27 juin 2015 12:17, "using namespace" <fo...@bk.ru> a écrit :

> I have deployed a web-service on TomEE 1.7.1 and currently having encoding
> problem when I work with request xml data. The web-service implements one
> method, which receives and xml data inside a SOAP message like following:
>
> <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/
> "
> xmlns:soap="http://tempuri.org/soaprequest">
>    <soapenv:Header/>
>    <soapenv:Body>
>       <soap:soaprequest>
>          <soap:streams>
>             <soap:soapin contentType="?">
>                <soap:Value>
>
>
>                      <tag_a>cyrillic text here...</tag_a>
>
>
>                </soap:Value>
>             </soap:soapin>
>          </soap:streams>
>       </soap:soaprequest>
>    </soapenv:Body>
> </soapenv:Envelope>
>
> Inside the web-service implementation class I retrieve everything from
>  tag and cast it to String:
>
>                         Element soapinElement = (Element)
> streams.getSoapin().getValue().getAny();
>                         Node node = (Node) soapinElement;
>                         Document document = node.getOwnerDocument();
>                         DOMImplementationLS domImplLS =
> (DOMImplementationLS)
> document.getImplementation();
>                         LSSerializer serializer =
> domImplLS.createLSSerializer();
>                         LSOutput output = domImplLS.createLSOutput();
>                         output.setEncoding("UTF-8");
>                         Writer stringWriter = new StringWriter();
>                         output.setCharacterStream(stringWriter);
>                         serializer.write(document, output);
>                         String soapinString = stringWriter.toString();
>
> And then I put soapinString into Oracle database CLOB column.
>
> Everything is great when SOAP message is encoded in UTF-8, but I get
> unreadable characters when SOAP message has different encoding, like CP1251
> and what I see in Oracle as a result is:
>
>
>
>                      <tag_a>РћР’Р” Р’РћР</tag_a>
>
>
>
> I tried encoding conversion like this:
>
>                         Element soapinElement = (Element)
> streams.getSoapin().getValue().getAny();
>                         Node node = (Node) soapinElement;
>                         Document document = node.getOwnerDocument();
>                         DOMImplementationLS domImplLS =
> (DOMImplementationLS)
> document.getImplementation();
>                         LSSerializer serializer =
> domImplLS.createLSSerializer();
>                         LSOutput output = domImplLS.createLSOutput();
>                         ByteArrayOutputStream byteArrayOutputStream = new
> ByteArrayOutputStream();
>                         output.setByteStream(byteArrayOutputStream);
>                         byte[] result =
> byteArrayOutputStream.toByteArray();
>                         InputStream is = new ByteArrayInputStream(result);
>                         Reader reader = new InputStreamReader(is,
> "windows-1251");
>                         OutputStream out = new ByteArrayOutputStream();
>                         Writer writer = new OutputStreamWriter(out,
> "UTF-8");
>                         writer.write("\uFEFF");
>             char[] buffer = new char[10];
>             int read;
>             while ((read = reader.read(buffer)) != -1) {
>                 writer.write(buffer, 0, read);
>             }
>             reader.close();
>             writer.close();
>             serializer.write((Node) out, output);
>             String soapinString = output.toString();
>
> But it produces something that looks like byte code.
> I would like to ask for some suggestions on possible ways to resolve
> encoding conversion to UTF-8.
>
>
>
> --
> View this message in context:
> http://tomee-openejb.979440.n4.nabble.com/Encoding-issue-tp4675408.html
> Sent from the TomEE Users mailing list archive at Nabble.com.
>

Re: Encoding issue

Posted by using namespace <fo...@bk.ru>.

thank everyone for answers, the issue resolution was found:

String soapinString = (new
String(StringEscapeUtils.unescapeHtml4(stringWriter.toString()).getBytes(StandardCharsets.ISO_8859_1),
"UTF-8")).replaceAll("\\p{Cntrl}", "");



--
View this message in context: http://tomee-openejb.979440.n4.nabble.com/Encoding-issue-tp4675408p4675467.html
Sent from the TomEE Users mailing list archive at Nabble.com.