You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@camel.apache.org by "Andreas A." <an...@gmail.com> on 2010/10/08 10:59:17 UTC

Charset conversion issue.

Hi

I'm interacting with a system that inputs and outputs textfiles in charset
Cp865.
I want to fetchs files from the system and convert them to Cp1252 locally.
I want to convert outgoing files from Cp1252 to Cp865.

I'm trying to use a combination of
<convertBodyTo type="String" charset="Windows-1252" />
setting CamelCharsetName on the exchange
and System.setProperty("org.apache.camel.default.charset", "Cp865");


However the conversion isn't succeeding I get ? for some of the special
chars.

When doing this with pure Java it with streams it works fine:

public static void main(String[] args) throws IOException {

         File infile = new File("data/Cp865.txt");
         File outfile = new File("data/Cp1252.txt");

         Reader in = new InputStreamReader(new FileInputStream(infile),
"Cp865");
         Writer out = new OutputStreamWriter(new FileOutputStream(outfile),
"Windows-1252");
         
         int c;

         while ((c = in.read()) != -1){
             out.write(c);}

         in.close();
         out.close();
     }

Any tips?
-- 
View this message in context: http://camel.465427.n5.nabble.com/Charset-conversion-issue-tp3204282p3204282.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Charset conversion issue.

Posted by Willem Jiang <wi...@gmail.com>.
On 10/8/10 8:36 PM, Andreas A. wrote:
>
> I have made this class to subsitute the<convertBodyTo>  tags.
>
> <from uri="file:/data/in" />
> <bean ref="CharsetConverter" method="toExternalCharset" />
> <bean ref="CharsetConverter" method="toInternalCharset" />
>
>
> public class CharsetConverter {
>
> 	public void toInternalCharset(Exchange exchange) throws Exception {
> 		String internalCharset = Charset.defaultCharset().name();
you need to set the exchange property before call the 
setBody(String.class), otherwise camel cannot know the right charset 
should be use.
> 		String converted = new
> String(exchange.getIn().getBody(String.class).getBytes(), internalCharset);
> 		exchange.getIn().setBody(converted);
> 		exchange.setProperty(Exchange.CHARSET_NAME, internalCharset);
> 	}
> 	
> 	public void toExternalCharset(Exchange exchange) throws Exception {
> 		String externalCharset =
> exchange.getContext().resolvePropertyPlaceholders("{{charset.external}}");
> 		System.out.println(externalCharset);
> 		String converted = new
> String(exchange.getIn().getBody(String.class).getBytes(), externalCharset);
> 		exchange.getIn().setBody(converted);
> 		exchange.setProperty(Exchange.CHARSET_NAME, externalCharset);
> 	}
> }
>
> However this does not produce the same (correct) output as using
> convertbodyto.


-- 
Willem
----------------------------------
Open Source Integration: http://www.fusesource.com
Blog:    http://willemjiang.blogspot.com (English)
          http://jnn.javaeye.com (Chinese)
Twitter: http://twitter.com/willemjiang

Re: Charset conversion issue.

Posted by "Andreas A." <an...@gmail.com>.
I have made this class to subsitute the <convertBodyTo> tags.

<from uri="file:/data/in" />
<bean ref="CharsetConverter" method="toExternalCharset" />
<bean ref="CharsetConverter" method="toInternalCharset" />


public class CharsetConverter {

	public void toInternalCharset(Exchange exchange) throws Exception {
		String internalCharset = Charset.defaultCharset().name();
		String converted = new
String(exchange.getIn().getBody(String.class).getBytes(), internalCharset);
		exchange.getIn().setBody(converted);
		exchange.setProperty(Exchange.CHARSET_NAME, internalCharset);
	}
	
	public void toExternalCharset(Exchange exchange) throws Exception {
		String externalCharset =
exchange.getContext().resolvePropertyPlaceholders("{{charset.external}}");
		System.out.println(externalCharset);
		String converted = new
String(exchange.getIn().getBody(String.class).getBytes(), externalCharset);
		exchange.getIn().setBody(converted);
		exchange.setProperty(Exchange.CHARSET_NAME, externalCharset);
	}
}

However this does not produce the same (correct) output as using
convertbodyto.
-- 
View this message in context: http://camel.465427.n5.nabble.com/Charset-conversion-issue-tp3204282p3204529.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Charset conversion issue.

Posted by Willem Jiang <wi...@gmail.com>.
On 10/8/10 8:13 PM, Andreas A. wrote:
>
> Where can I see what convertBodyTo translates to in Java code? I would like
> to see what it does.

You can find the code here[1]

[1]https://svn.apache.org/repos/asf/camel/trunk/camel-core/src/main/java/org/apache/camel/processor/ConvertBodyProcessor.java

-- 
Willem
----------------------------------
Open Source Integration: http://www.fusesource.com
Blog:    http://willemjiang.blogspot.com (English)
          http://jnn.javaeye.com (Chinese)
Twitter: http://twitter.com/willemjiang

Re: Charset conversion issue.

Posted by Claus Ibsen <cl...@gmail.com>.
On Fri, Oct 8, 2010 at 2:13 PM, Andreas A. <an...@gmail.com> wrote:
>
> Where can I see what convertBodyTo translates to in Java code? I would like
> to see what it does.

Every EIP has a xxxDefintion in the model package.

So go find ConvertBodyDefinition and go from there.


> --
> View this message in context: http://camel.465427.n5.nabble.com/Charset-conversion-issue-tp3204282p3204502.html
> Sent from the Camel - Users mailing list archive at Nabble.com.
>



-- 
Claus Ibsen
Apache Camel Committer

Author of Camel in Action: http://www.manning.com/ibsen/
Open Source Integration: http://fusesource.com
Blog: http://davsclaus.blogspot.com/
Twitter: http://twitter.com/davsclaus

Re: Charset conversion issue.

Posted by "Andreas A." <an...@gmail.com>.
Where can I see what convertBodyTo translates to in Java code? I would like
to see what it does.
-- 
View this message in context: http://camel.465427.n5.nabble.com/Charset-conversion-issue-tp3204282p3204502.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Charset conversion issue.

Posted by Willem Jiang <wi...@gmail.com>.
On 10/8/10 8:03 PM, Andreas A. wrote:
>
> Hi Willem
>
> Can you explain why you think it makes sense?
>
>
> - Andreas
<from uri="file:data/in/test" />

The message body is an InputStream, with this convert, the body is 
changed to String with the charset Cp865.

<convertBodyTo charset="Cp865" type="String" />

The message body is not change yet, and it set the exchange property 
with key Exchange.CHARSET_NAME and value "Cp1252", when send the message 
to a File endpoint, camel converter will turn the String into a 
ByteArrayInputStream with the charset Cp1252.

<convertBodyTo charset="Cp1252" type="String" />

I think you can write a right customer processor with this information.

-- 
Willem
----------------------------------
Open Source Integration: http://www.fusesource.com
Blog:    http://willemjiang.blogspot.com (English)
          http://jnn.javaeye.com (Chinese)
Twitter: http://twitter.com/willemjiang

Re: Charset conversion issue.

Posted by "Andreas A." <an...@gmail.com>.
Hi Willem

Can you explain why you think it makes sense?


- Andreas
-- 
View this message in context: http://camel.465427.n5.nabble.com/Charset-conversion-issue-tp3204282p3204490.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Charset conversion issue.

Posted by Willem Jiang <wi...@gmail.com>.
On 10/8/10 7:41 PM, Andreas A. wrote:
>
> Hi
>
> Ok I will make a sample later, I needed a solution urgently though :)
>
> I just tested this:
>
> <from uri="file:data/in/test" />
> <convertBodyTo charset="Cp865" type="String" />
> <convertBodyTo charset="Cp1252" type="String" />
>
> And this results in the conversion being correct *shrugs*.

I think this solution is making sense.
If you don't like this, you can do it in your customer processor.

-- 
Willem
----------------------------------
Open Source Integration: http://www.fusesource.com
Blog:    http://willemjiang.blogspot.com (English)
          http://jnn.javaeye.com (Chinese)
Twitter: http://twitter.com/willemjiang

Re: Charset conversion issue.

Posted by "Andreas A." <an...@gmail.com>.
Hi

Ok I will make a sample later, I needed a solution urgently though :)

I just tested this:

<from uri="file:data/in/test" />
<convertBodyTo charset="Cp865" type="String" />
<convertBodyTo charset="Cp1252" type="String" />

And this results in the conversion being correct *shrugs*.
-- 
View this message in context: http://camel.465427.n5.nabble.com/Charset-conversion-issue-tp3204282p3204461.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Charset conversion issue.

Posted by Claus Ibsen <cl...@gmail.com>.
Hi

Create a small project / unit test and attach it to a JIRA ticket.
And make sure those txt files is saved in that encoding your expect.

Then we can do a test on windows to ensure it works as expected.



On Fri, Oct 8, 2010 at 10:59 AM, Andreas A. <an...@gmail.com> wrote:
>
> Hi
>
> I'm interacting with a system that inputs and outputs textfiles in charset
> Cp865.
> I want to fetchs files from the system and convert them to Cp1252 locally.
> I want to convert outgoing files from Cp1252 to Cp865.
>
> I'm trying to use a combination of
> <convertBodyTo type="String" charset="Windows-1252" />
> setting CamelCharsetName on the exchange
> and System.setProperty("org.apache.camel.default.charset", "Cp865");
>
>
> However the conversion isn't succeeding I get ? for some of the special
> chars.
>
> When doing this with pure Java it with streams it works fine:
>
> public static void main(String[] args) throws IOException {
>
>         File infile = new File("data/Cp865.txt");
>         File outfile = new File("data/Cp1252.txt");
>
>         Reader in = new InputStreamReader(new FileInputStream(infile),
> "Cp865");
>         Writer out = new OutputStreamWriter(new FileOutputStream(outfile),
> "Windows-1252");
>
>         int c;
>
>         while ((c = in.read()) != -1){
>             out.write(c);}
>
>         in.close();
>         out.close();
>     }
>
> Any tips?
> --
> View this message in context: http://camel.465427.n5.nabble.com/Charset-conversion-issue-tp3204282p3204282.html
> Sent from the Camel - Users mailing list archive at Nabble.com.
>



-- 
Claus Ibsen
Apache Camel Committer

Author of Camel in Action: http://www.manning.com/ibsen/
Open Source Integration: http://fusesource.com
Blog: http://davsclaus.blogspot.com/
Twitter: http://twitter.com/davsclaus

Re: Charset conversion issue.

Posted by "Andreas A." <an...@gmail.com>.
After a lot of trial and error I can see that the files are only read
correctly as Cp865 if I set the system property
"org.apache.camel.default.charset" to Cp865. If I try anything else I get ?
? ? for characters such as æ ø å. Can someone explain the mechanics behind
this? I thought that the body of message would just be a GenericFile and
that the conversion would not happen until I ask the file to be a String. Ie
it should be enough to just set the charset on the exchange under <from> and
then when a component wants to convert the message to a String it will use
the correct encoding. This is not the behaviour I experience.
-- 
View this message in context: http://camel.465427.n5.nabble.com/Charset-conversion-issue-tp3204282p3204437.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Charset conversion issue.

Posted by "Andreas A." <an...@gmail.com>.
Is it correct that setting "org.apache.camel.default.charset" is the only way
to make Camel 2.4 read a file in a specific charset?
-- 
View this message in context: http://camel.465427.n5.nabble.com/Charset-conversion-issue-tp3204282p3204315.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Charset conversion issue.

Posted by "Andreas A." <an...@gmail.com>.
Hi

I'm doing as below now which works ok - but isn't this what <convertBodyTo>
is supposed to do for me?


	@Override
	public void process(Exchange exchange) throws Exception {
		String converted = new
String(exchange.getIn().getBody(String.class).getBytes("Cp1252"));
		exchange.getIn().setBody(converted);
		exchange.setProperty("CamelCharsetName", "Cp1252");
	}
-- 
View this message in context: http://camel.465427.n5.nabble.com/Charset-conversion-issue-tp3204282p3204303.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Charset conversion issue.

Posted by "Andreas A." <an...@gmail.com>.
Maybe the "charset" option on the File/FTP endpoint from 2.5 is what I want.
Use the Cp865 as on the consumer and Cp1252 on the producer and vice versa?
-- 
View this message in context: http://camel.465427.n5.nabble.com/Charset-conversion-issue-tp3204282p3204293.html
Sent from the Camel - Users mailing list archive at Nabble.com.