You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@camel.apache.org by "Andreas A." <an...@gmail.com> on 2010/10/08 10:59:17 UTC
Charset conversion issue.
Hi
I'm interacting with a system that inputs and outputs textfiles in charset
Cp865.
I want to fetchs files from the system and convert them to Cp1252 locally.
I want to convert outgoing files from Cp1252 to Cp865.
I'm trying to use a combination of
<convertBodyTo type="String" charset="Windows-1252" />
setting CamelCharsetName on the exchange
and System.setProperty("org.apache.camel.default.charset", "Cp865");
However the conversion isn't succeeding I get ? for some of the special
chars.
When doing this with pure Java it with streams it works fine:
public static void main(String[] args) throws IOException {
File infile = new File("data/Cp865.txt");
File outfile = new File("data/Cp1252.txt");
Reader in = new InputStreamReader(new FileInputStream(infile),
"Cp865");
Writer out = new OutputStreamWriter(new FileOutputStream(outfile),
"Windows-1252");
int c;
while ((c = in.read()) != -1){
out.write(c);}
in.close();
out.close();
}
Any tips?
--
View this message in context: http://camel.465427.n5.nabble.com/Charset-conversion-issue-tp3204282p3204282.html
Sent from the Camel - Users mailing list archive at Nabble.com.
Re: Charset conversion issue.
Posted by Willem Jiang <wi...@gmail.com>.
On 10/8/10 8:36 PM, Andreas A. wrote:
>
> I have made this class to subsitute the<convertBodyTo> tags.
>
> <from uri="file:/data/in" />
> <bean ref="CharsetConverter" method="toExternalCharset" />
> <bean ref="CharsetConverter" method="toInternalCharset" />
>
>
> public class CharsetConverter {
>
> public void toInternalCharset(Exchange exchange) throws Exception {
> String internalCharset = Charset.defaultCharset().name();
you need to set the exchange property before call the
setBody(String.class), otherwise camel cannot know the right charset
should be use.
> String converted = new
> String(exchange.getIn().getBody(String.class).getBytes(), internalCharset);
> exchange.getIn().setBody(converted);
> exchange.setProperty(Exchange.CHARSET_NAME, internalCharset);
> }
>
> public void toExternalCharset(Exchange exchange) throws Exception {
> String externalCharset =
> exchange.getContext().resolvePropertyPlaceholders("{{charset.external}}");
> System.out.println(externalCharset);
> String converted = new
> String(exchange.getIn().getBody(String.class).getBytes(), externalCharset);
> exchange.getIn().setBody(converted);
> exchange.setProperty(Exchange.CHARSET_NAME, externalCharset);
> }
> }
>
> However this does not produce the same (correct) output as using
> convertbodyto.
--
Willem
----------------------------------
Open Source Integration: http://www.fusesource.com
Blog: http://willemjiang.blogspot.com (English)
http://jnn.javaeye.com (Chinese)
Twitter: http://twitter.com/willemjiang
Re: Charset conversion issue.
Posted by "Andreas A." <an...@gmail.com>.
I have made this class to subsitute the <convertBodyTo> tags.
<from uri="file:/data/in" />
<bean ref="CharsetConverter" method="toExternalCharset" />
<bean ref="CharsetConverter" method="toInternalCharset" />
public class CharsetConverter {
public void toInternalCharset(Exchange exchange) throws Exception {
String internalCharset = Charset.defaultCharset().name();
String converted = new
String(exchange.getIn().getBody(String.class).getBytes(), internalCharset);
exchange.getIn().setBody(converted);
exchange.setProperty(Exchange.CHARSET_NAME, internalCharset);
}
public void toExternalCharset(Exchange exchange) throws Exception {
String externalCharset =
exchange.getContext().resolvePropertyPlaceholders("{{charset.external}}");
System.out.println(externalCharset);
String converted = new
String(exchange.getIn().getBody(String.class).getBytes(), externalCharset);
exchange.getIn().setBody(converted);
exchange.setProperty(Exchange.CHARSET_NAME, externalCharset);
}
}
However this does not produce the same (correct) output as using
convertbodyto.
--
View this message in context: http://camel.465427.n5.nabble.com/Charset-conversion-issue-tp3204282p3204529.html
Sent from the Camel - Users mailing list archive at Nabble.com.
Re: Charset conversion issue.
Posted by Willem Jiang <wi...@gmail.com>.
On 10/8/10 8:13 PM, Andreas A. wrote:
>
> Where can I see what convertBodyTo translates to in Java code? I would like
> to see what it does.
You can find the code here[1]
[1]https://svn.apache.org/repos/asf/camel/trunk/camel-core/src/main/java/org/apache/camel/processor/ConvertBodyProcessor.java
--
Willem
----------------------------------
Open Source Integration: http://www.fusesource.com
Blog: http://willemjiang.blogspot.com (English)
http://jnn.javaeye.com (Chinese)
Twitter: http://twitter.com/willemjiang
Re: Charset conversion issue.
Posted by Claus Ibsen <cl...@gmail.com>.
On Fri, Oct 8, 2010 at 2:13 PM, Andreas A. <an...@gmail.com> wrote:
>
> Where can I see what convertBodyTo translates to in Java code? I would like
> to see what it does.
Every EIP has a xxxDefintion in the model package.
So go find ConvertBodyDefinition and go from there.
> --
> View this message in context: http://camel.465427.n5.nabble.com/Charset-conversion-issue-tp3204282p3204502.html
> Sent from the Camel - Users mailing list archive at Nabble.com.
>
--
Claus Ibsen
Apache Camel Committer
Author of Camel in Action: http://www.manning.com/ibsen/
Open Source Integration: http://fusesource.com
Blog: http://davsclaus.blogspot.com/
Twitter: http://twitter.com/davsclaus
Re: Charset conversion issue.
Posted by "Andreas A." <an...@gmail.com>.
Where can I see what convertBodyTo translates to in Java code? I would like
to see what it does.
--
View this message in context: http://camel.465427.n5.nabble.com/Charset-conversion-issue-tp3204282p3204502.html
Sent from the Camel - Users mailing list archive at Nabble.com.
Re: Charset conversion issue.
Posted by Willem Jiang <wi...@gmail.com>.
On 10/8/10 8:03 PM, Andreas A. wrote:
>
> Hi Willem
>
> Can you explain why you think it makes sense?
>
>
> - Andreas
<from uri="file:data/in/test" />
The message body is an InputStream, with this convert, the body is
changed to String with the charset Cp865.
<convertBodyTo charset="Cp865" type="String" />
The message body is not change yet, and it set the exchange property
with key Exchange.CHARSET_NAME and value "Cp1252", when send the message
to a File endpoint, camel converter will turn the String into a
ByteArrayInputStream with the charset Cp1252.
<convertBodyTo charset="Cp1252" type="String" />
I think you can write a right customer processor with this information.
--
Willem
----------------------------------
Open Source Integration: http://www.fusesource.com
Blog: http://willemjiang.blogspot.com (English)
http://jnn.javaeye.com (Chinese)
Twitter: http://twitter.com/willemjiang
Re: Charset conversion issue.
Posted by "Andreas A." <an...@gmail.com>.
Hi Willem
Can you explain why you think it makes sense?
- Andreas
--
View this message in context: http://camel.465427.n5.nabble.com/Charset-conversion-issue-tp3204282p3204490.html
Sent from the Camel - Users mailing list archive at Nabble.com.
Re: Charset conversion issue.
Posted by Willem Jiang <wi...@gmail.com>.
On 10/8/10 7:41 PM, Andreas A. wrote:
>
> Hi
>
> Ok I will make a sample later, I needed a solution urgently though :)
>
> I just tested this:
>
> <from uri="file:data/in/test" />
> <convertBodyTo charset="Cp865" type="String" />
> <convertBodyTo charset="Cp1252" type="String" />
>
> And this results in the conversion being correct *shrugs*.
I think this solution is making sense.
If you don't like this, you can do it in your customer processor.
--
Willem
----------------------------------
Open Source Integration: http://www.fusesource.com
Blog: http://willemjiang.blogspot.com (English)
http://jnn.javaeye.com (Chinese)
Twitter: http://twitter.com/willemjiang
Re: Charset conversion issue.
Posted by "Andreas A." <an...@gmail.com>.
Hi
Ok I will make a sample later, I needed a solution urgently though :)
I just tested this:
<from uri="file:data/in/test" />
<convertBodyTo charset="Cp865" type="String" />
<convertBodyTo charset="Cp1252" type="String" />
And this results in the conversion being correct *shrugs*.
--
View this message in context: http://camel.465427.n5.nabble.com/Charset-conversion-issue-tp3204282p3204461.html
Sent from the Camel - Users mailing list archive at Nabble.com.
Re: Charset conversion issue.
Posted by Claus Ibsen <cl...@gmail.com>.
Hi
Create a small project / unit test and attach it to a JIRA ticket.
And make sure those txt files is saved in that encoding your expect.
Then we can do a test on windows to ensure it works as expected.
On Fri, Oct 8, 2010 at 10:59 AM, Andreas A. <an...@gmail.com> wrote:
>
> Hi
>
> I'm interacting with a system that inputs and outputs textfiles in charset
> Cp865.
> I want to fetchs files from the system and convert them to Cp1252 locally.
> I want to convert outgoing files from Cp1252 to Cp865.
>
> I'm trying to use a combination of
> <convertBodyTo type="String" charset="Windows-1252" />
> setting CamelCharsetName on the exchange
> and System.setProperty("org.apache.camel.default.charset", "Cp865");
>
>
> However the conversion isn't succeeding I get ? for some of the special
> chars.
>
> When doing this with pure Java it with streams it works fine:
>
> public static void main(String[] args) throws IOException {
>
> File infile = new File("data/Cp865.txt");
> File outfile = new File("data/Cp1252.txt");
>
> Reader in = new InputStreamReader(new FileInputStream(infile),
> "Cp865");
> Writer out = new OutputStreamWriter(new FileOutputStream(outfile),
> "Windows-1252");
>
> int c;
>
> while ((c = in.read()) != -1){
> out.write(c);}
>
> in.close();
> out.close();
> }
>
> Any tips?
> --
> View this message in context: http://camel.465427.n5.nabble.com/Charset-conversion-issue-tp3204282p3204282.html
> Sent from the Camel - Users mailing list archive at Nabble.com.
>
--
Claus Ibsen
Apache Camel Committer
Author of Camel in Action: http://www.manning.com/ibsen/
Open Source Integration: http://fusesource.com
Blog: http://davsclaus.blogspot.com/
Twitter: http://twitter.com/davsclaus
Re: Charset conversion issue.
Posted by "Andreas A." <an...@gmail.com>.
After a lot of trial and error I can see that the files are only read
correctly as Cp865 if I set the system property
"org.apache.camel.default.charset" to Cp865. If I try anything else I get ?
? ? for characters such as æ ø å. Can someone explain the mechanics behind
this? I thought that the body of message would just be a GenericFile and
that the conversion would not happen until I ask the file to be a String. Ie
it should be enough to just set the charset on the exchange under <from> and
then when a component wants to convert the message to a String it will use
the correct encoding. This is not the behaviour I experience.
--
View this message in context: http://camel.465427.n5.nabble.com/Charset-conversion-issue-tp3204282p3204437.html
Sent from the Camel - Users mailing list archive at Nabble.com.
Re: Charset conversion issue.
Posted by "Andreas A." <an...@gmail.com>.
Is it correct that setting "org.apache.camel.default.charset" is the only way
to make Camel 2.4 read a file in a specific charset?
--
View this message in context: http://camel.465427.n5.nabble.com/Charset-conversion-issue-tp3204282p3204315.html
Sent from the Camel - Users mailing list archive at Nabble.com.
Re: Charset conversion issue.
Posted by "Andreas A." <an...@gmail.com>.
Hi
I'm doing as below now which works ok - but isn't this what <convertBodyTo>
is supposed to do for me?
@Override
public void process(Exchange exchange) throws Exception {
String converted = new
String(exchange.getIn().getBody(String.class).getBytes("Cp1252"));
exchange.getIn().setBody(converted);
exchange.setProperty("CamelCharsetName", "Cp1252");
}
--
View this message in context: http://camel.465427.n5.nabble.com/Charset-conversion-issue-tp3204282p3204303.html
Sent from the Camel - Users mailing list archive at Nabble.com.
Re: Charset conversion issue.
Posted by "Andreas A." <an...@gmail.com>.
Maybe the "charset" option on the File/FTP endpoint from 2.5 is what I want.
Use the Cp865 as on the consumer and Cp1252 on the producer and vice versa?
--
View this message in context: http://camel.465427.n5.nabble.com/Charset-conversion-issue-tp3204282p3204293.html
Sent from the Camel - Users mailing list archive at Nabble.com.