You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomee.apache.org by using namespace <fo...@bk.ru> on 2015/06/25 15:21:58 UTC
Encoding issue
I have deployed a web-service on TomEE 1.7.1 and currently having encoding
problem when I work with request xml data. The web-service implements one
method, which receives and xml data inside a SOAP message like following:
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:soap="http://tempuri.org/soaprequest">
<soapenv:Header/>
<soapenv:Body>
<soap:soaprequest>
<soap:streams>
<soap:soapin contentType="?">
<soap:Value>
<tag_a>cyrillic text here...</tag_a>
</soap:Value>
</soap:soapin>
</soap:streams>
</soap:soaprequest>
</soapenv:Body>
</soapenv:Envelope>
Inside the web-service implementation class I retrieve everything from
tag and cast it to String:
Element soapinElement = (Element)
streams.getSoapin().getValue().getAny();
Node node = (Node) soapinElement;
Document document = node.getOwnerDocument();
DOMImplementationLS domImplLS = (DOMImplementationLS)
document.getImplementation();
LSSerializer serializer = domImplLS.createLSSerializer();
LSOutput output = domImplLS.createLSOutput();
output.setEncoding("UTF-8");
Writer stringWriter = new StringWriter();
output.setCharacterStream(stringWriter);
serializer.write(document, output);
String soapinString = stringWriter.toString();
And then I put soapinString into Oracle database CLOB column.
Everything is great when SOAP message is encoded in UTF-8, but I get
unreadable characters when SOAP message has different encoding, like CP1251
and what I see in Oracle as a result is:
<tag_a>РћР’Р” Р’РћР</tag_a>
I tried encoding conversion like this:
Element soapinElement = (Element)
streams.getSoapin().getValue().getAny();
Node node = (Node) soapinElement;
Document document = node.getOwnerDocument();
DOMImplementationLS domImplLS = (DOMImplementationLS)
document.getImplementation();
LSSerializer serializer = domImplLS.createLSSerializer();
LSOutput output = domImplLS.createLSOutput();
ByteArrayOutputStream byteArrayOutputStream = new
ByteArrayOutputStream();
output.setByteStream(byteArrayOutputStream);
byte[] result = byteArrayOutputStream.toByteArray();
InputStream is = new ByteArrayInputStream(result);
Reader reader = new InputStreamReader(is, "windows-1251");
OutputStream out = new ByteArrayOutputStream();
Writer writer = new OutputStreamWriter(out, "UTF-8");
writer.write("\uFEFF");
char[] buffer = new char[10];
int read;
while ((read = reader.read(buffer)) != -1) {
writer.write(buffer, 0, read);
}
reader.close();
writer.close();
serializer.write((Node) out, output);
String soapinString = output.toString();
But it produces something that looks like byte code.
I would like to ask for some suggestions on possible ways to resolve
encoding conversion to UTF-8.
--
View this message in context: http://tomee-openejb.979440.n4.nabble.com/Encoding-issue-tp4675408.html
Sent from the TomEE Users mailing list archive at Nabble.com.
Re: Encoding issue
Posted by Romain Manni-Bucau <rm...@gmail.com>.
well, while HTTP spec (not any java spec) considers UTF8 cant be a default
I guess well be there...
Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> | Blog
<http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> |
LinkedIn <https://www.linkedin.com/in/rmannibucau> | Tomitriber
<http://www.tomitribe.com>
2015-06-29 10:09 GMT-07:00 Jean-Louis Monteiro <jl...@tomitribe.com>:
> I've been creating such a trick for year, can't believe we are still here
> today.
> Not a useful comment but this thread made me react again.
>
> --
> Jean-Louis Monteiro
> http://twitter.com/jlouismonteiro
> http://www.tomitribe.com
>
> On Sat, Jun 27, 2015 at 4:35 PM, Romain Manni-Bucau <rmannibucau@gmail.com
> >
> wrote:
>
> > Hi
> >
> > sounds normal (http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q1),
> > maybe add a filter setting the encoding (request.setCharacterEncoding(
> > "UTF-8");)
> >
> > Le 27 juin 2015 12:17, "using namespace" <fo...@bk.ru> a écrit :
> >
> > > I have deployed a web-service on TomEE 1.7.1 and currently having
> > encoding
> > > problem when I work with request xml data. The web-service implements
> one
> > > method, which receives and xml data inside a SOAP message like
> following:
> > >
> > > <soapenv:Envelope xmlns:soapenv="
> > http://schemas.xmlsoap.org/soap/envelope/
> > > "
> > > xmlns:soap="http://tempuri.org/soaprequest">
> > > <soapenv:Header/>
> > > <soapenv:Body>
> > > <soap:soaprequest>
> > > <soap:streams>
> > > <soap:soapin contentType="?">
> > > <soap:Value>
> > >
> > >
> > > <tag_a>cyrillic text here...</tag_a>
> > >
> > >
> > > </soap:Value>
> > > </soap:soapin>
> > > </soap:streams>
> > > </soap:soaprequest>
> > > </soapenv:Body>
> > > </soapenv:Envelope>
> > >
> > > Inside the web-service implementation class I retrieve everything from
> > > tag and cast it to String:
> > >
> > > Element soapinElement = (Element)
> > > streams.getSoapin().getValue().getAny();
> > > Node node = (Node) soapinElement;
> > > Document document = node.getOwnerDocument();
> > > DOMImplementationLS domImplLS =
> > > (DOMImplementationLS)
> > > document.getImplementation();
> > > LSSerializer serializer =
> > > domImplLS.createLSSerializer();
> > > LSOutput output = domImplLS.createLSOutput();
> > > output.setEncoding("UTF-8");
> > > Writer stringWriter = new StringWriter();
> > > output.setCharacterStream(stringWriter);
> > > serializer.write(document, output);
> > > String soapinString = stringWriter.toString();
> > >
> > > And then I put soapinString into Oracle database CLOB column.
> > >
> > > Everything is great when SOAP message is encoded in UTF-8, but I get
> > > unreadable characters when SOAP message has different encoding, like
> > CP1251
> > > and what I see in Oracle as a result is:
> > >
> > >
> > >
> > > <tag_a>РћР’Р” Р’РћР</tag_a>
> > >
> > >
> > >
> > > I tried encoding conversion like this:
> > >
> > > Element soapinElement = (Element)
> > > streams.getSoapin().getValue().getAny();
> > > Node node = (Node) soapinElement;
> > > Document document = node.getOwnerDocument();
> > > DOMImplementationLS domImplLS =
> > > (DOMImplementationLS)
> > > document.getImplementation();
> > > LSSerializer serializer =
> > > domImplLS.createLSSerializer();
> > > LSOutput output = domImplLS.createLSOutput();
> > > ByteArrayOutputStream byteArrayOutputStream =
> new
> > > ByteArrayOutputStream();
> > > output.setByteStream(byteArrayOutputStream);
> > > byte[] result =
> > > byteArrayOutputStream.toByteArray();
> > > InputStream is = new
> > ByteArrayInputStream(result);
> > > Reader reader = new InputStreamReader(is,
> > > "windows-1251");
> > > OutputStream out = new ByteArrayOutputStream();
> > > Writer writer = new OutputStreamWriter(out,
> > > "UTF-8");
> > > writer.write("\uFEFF");
> > > char[] buffer = new char[10];
> > > int read;
> > > while ((read = reader.read(buffer)) != -1) {
> > > writer.write(buffer, 0, read);
> > > }
> > > reader.close();
> > > writer.close();
> > > serializer.write((Node) out, output);
> > > String soapinString = output.toString();
> > >
> > > But it produces something that looks like byte code.
> > > I would like to ask for some suggestions on possible ways to resolve
> > > encoding conversion to UTF-8.
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > >
> http://tomee-openejb.979440.n4.nabble.com/Encoding-issue-tp4675408.html
> > > Sent from the TomEE Users mailing list archive at Nabble.com.
> > >
> >
>
Re: Encoding issue
Posted by Jean-Louis Monteiro <jl...@tomitribe.com>.
I've been creating such a trick for year, can't believe we are still here
today.
Not a useful comment but this thread made me react again.
--
Jean-Louis Monteiro
http://twitter.com/jlouismonteiro
http://www.tomitribe.com
On Sat, Jun 27, 2015 at 4:35 PM, Romain Manni-Bucau <rm...@gmail.com>
wrote:
> Hi
>
> sounds normal (http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q1),
> maybe add a filter setting the encoding (request.setCharacterEncoding(
> "UTF-8");)
>
> Le 27 juin 2015 12:17, "using namespace" <fo...@bk.ru> a écrit :
>
> > I have deployed a web-service on TomEE 1.7.1 and currently having
> encoding
> > problem when I work with request xml data. The web-service implements one
> > method, which receives and xml data inside a SOAP message like following:
> >
> > <soapenv:Envelope xmlns:soapenv="
> http://schemas.xmlsoap.org/soap/envelope/
> > "
> > xmlns:soap="http://tempuri.org/soaprequest">
> > <soapenv:Header/>
> > <soapenv:Body>
> > <soap:soaprequest>
> > <soap:streams>
> > <soap:soapin contentType="?">
> > <soap:Value>
> >
> >
> > <tag_a>cyrillic text here...</tag_a>
> >
> >
> > </soap:Value>
> > </soap:soapin>
> > </soap:streams>
> > </soap:soaprequest>
> > </soapenv:Body>
> > </soapenv:Envelope>
> >
> > Inside the web-service implementation class I retrieve everything from
> > tag and cast it to String:
> >
> > Element soapinElement = (Element)
> > streams.getSoapin().getValue().getAny();
> > Node node = (Node) soapinElement;
> > Document document = node.getOwnerDocument();
> > DOMImplementationLS domImplLS =
> > (DOMImplementationLS)
> > document.getImplementation();
> > LSSerializer serializer =
> > domImplLS.createLSSerializer();
> > LSOutput output = domImplLS.createLSOutput();
> > output.setEncoding("UTF-8");
> > Writer stringWriter = new StringWriter();
> > output.setCharacterStream(stringWriter);
> > serializer.write(document, output);
> > String soapinString = stringWriter.toString();
> >
> > And then I put soapinString into Oracle database CLOB column.
> >
> > Everything is great when SOAP message is encoded in UTF-8, but I get
> > unreadable characters when SOAP message has different encoding, like
> CP1251
> > and what I see in Oracle as a result is:
> >
> >
> >
> > <tag_a>РћР’Р” Р’РћР</tag_a>
> >
> >
> >
> > I tried encoding conversion like this:
> >
> > Element soapinElement = (Element)
> > streams.getSoapin().getValue().getAny();
> > Node node = (Node) soapinElement;
> > Document document = node.getOwnerDocument();
> > DOMImplementationLS domImplLS =
> > (DOMImplementationLS)
> > document.getImplementation();
> > LSSerializer serializer =
> > domImplLS.createLSSerializer();
> > LSOutput output = domImplLS.createLSOutput();
> > ByteArrayOutputStream byteArrayOutputStream = new
> > ByteArrayOutputStream();
> > output.setByteStream(byteArrayOutputStream);
> > byte[] result =
> > byteArrayOutputStream.toByteArray();
> > InputStream is = new
> ByteArrayInputStream(result);
> > Reader reader = new InputStreamReader(is,
> > "windows-1251");
> > OutputStream out = new ByteArrayOutputStream();
> > Writer writer = new OutputStreamWriter(out,
> > "UTF-8");
> > writer.write("\uFEFF");
> > char[] buffer = new char[10];
> > int read;
> > while ((read = reader.read(buffer)) != -1) {
> > writer.write(buffer, 0, read);
> > }
> > reader.close();
> > writer.close();
> > serializer.write((Node) out, output);
> > String soapinString = output.toString();
> >
> > But it produces something that looks like byte code.
> > I would like to ask for some suggestions on possible ways to resolve
> > encoding conversion to UTF-8.
> >
> >
> >
> > --
> > View this message in context:
> > http://tomee-openejb.979440.n4.nabble.com/Encoding-issue-tp4675408.html
> > Sent from the TomEE Users mailing list archive at Nabble.com.
> >
>
Re: Encoding issue
Posted by Romain Manni-Bucau <rm...@gmail.com>.
Hi
sounds normal (http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q1),
maybe add a filter setting the encoding (request.setCharacterEncoding(
"UTF-8");)
Le 27 juin 2015 12:17, "using namespace" <fo...@bk.ru> a écrit :
> I have deployed a web-service on TomEE 1.7.1 and currently having encoding
> problem when I work with request xml data. The web-service implements one
> method, which receives and xml data inside a SOAP message like following:
>
> <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/
> "
> xmlns:soap="http://tempuri.org/soaprequest">
> <soapenv:Header/>
> <soapenv:Body>
> <soap:soaprequest>
> <soap:streams>
> <soap:soapin contentType="?">
> <soap:Value>
>
>
> <tag_a>cyrillic text here...</tag_a>
>
>
> </soap:Value>
> </soap:soapin>
> </soap:streams>
> </soap:soaprequest>
> </soapenv:Body>
> </soapenv:Envelope>
>
> Inside the web-service implementation class I retrieve everything from
> tag and cast it to String:
>
> Element soapinElement = (Element)
> streams.getSoapin().getValue().getAny();
> Node node = (Node) soapinElement;
> Document document = node.getOwnerDocument();
> DOMImplementationLS domImplLS =
> (DOMImplementationLS)
> document.getImplementation();
> LSSerializer serializer =
> domImplLS.createLSSerializer();
> LSOutput output = domImplLS.createLSOutput();
> output.setEncoding("UTF-8");
> Writer stringWriter = new StringWriter();
> output.setCharacterStream(stringWriter);
> serializer.write(document, output);
> String soapinString = stringWriter.toString();
>
> And then I put soapinString into Oracle database CLOB column.
>
> Everything is great when SOAP message is encoded in UTF-8, but I get
> unreadable characters when SOAP message has different encoding, like CP1251
> and what I see in Oracle as a result is:
>
>
>
> <tag_a>РћР’Р” Р’РћР</tag_a>
>
>
>
> I tried encoding conversion like this:
>
> Element soapinElement = (Element)
> streams.getSoapin().getValue().getAny();
> Node node = (Node) soapinElement;
> Document document = node.getOwnerDocument();
> DOMImplementationLS domImplLS =
> (DOMImplementationLS)
> document.getImplementation();
> LSSerializer serializer =
> domImplLS.createLSSerializer();
> LSOutput output = domImplLS.createLSOutput();
> ByteArrayOutputStream byteArrayOutputStream = new
> ByteArrayOutputStream();
> output.setByteStream(byteArrayOutputStream);
> byte[] result =
> byteArrayOutputStream.toByteArray();
> InputStream is = new ByteArrayInputStream(result);
> Reader reader = new InputStreamReader(is,
> "windows-1251");
> OutputStream out = new ByteArrayOutputStream();
> Writer writer = new OutputStreamWriter(out,
> "UTF-8");
> writer.write("\uFEFF");
> char[] buffer = new char[10];
> int read;
> while ((read = reader.read(buffer)) != -1) {
> writer.write(buffer, 0, read);
> }
> reader.close();
> writer.close();
> serializer.write((Node) out, output);
> String soapinString = output.toString();
>
> But it produces something that looks like byte code.
> I would like to ask for some suggestions on possible ways to resolve
> encoding conversion to UTF-8.
>
>
>
> --
> View this message in context:
> http://tomee-openejb.979440.n4.nabble.com/Encoding-issue-tp4675408.html
> Sent from the TomEE Users mailing list archive at Nabble.com.
>
Re: Encoding issue
Posted by using namespace <fo...@bk.ru>.
thank everyone for answers, the issue resolution was found:
String soapinString = (new
String(StringEscapeUtils.unescapeHtml4(stringWriter.toString()).getBytes(StandardCharsets.ISO_8859_1),
"UTF-8")).replaceAll("\\p{Cntrl}", "");
--
View this message in context: http://tomee-openejb.979440.n4.nabble.com/Encoding-issue-tp4675408p4675467.html
Sent from the TomEE Users mailing list archive at Nabble.com.