You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@camel.apache.org by Gustav Sinder <gu...@ferrologic.se> on 2015/09/22 12:28:40 UTC

RE: Wrong charset when using FTP2 component, locale issue?

Finally found a solution to this by using the same charset (iso-8859-1) for convertBodyTo as for the ftp endpoint, i.e:
<convertBodyTo type="java.lang.String" charset="iso-8859-1"/>

Kind regards
/Gustav

-----Original Message-----
From: Gustav Sinder [mailto:gustav.sinder@ferrologic.se] 
Sent: den 2 juli 2015 13:48
To: users@camel.apache.org
Subject: RE: Wrong charset when using FTP2 component, locale issue?

I realize I should probably provide the full picture here:

The context consists of two routes where the first:
-----
<from uri="<my ftp including the binary mode and charset set"> <to uri="direct-vm:another-route-that-returns nothing?timeout=300000"/>

<!-- Needed for the splitter -->
<convertBodyTo type="java.lang.String"/> <split streaming="true">
	<tokenize token="\n" group="5000"/>
	<wireTap uri="activemq:myQueue"/>
</split>
-----
And second:
-----
<from uri=" activemq:myQueue"/>
<unmarshal>
	<csv delimiter=";"/>
</unmarshal>
<bean ref="transformCSV" method="validateAndTransform"/>
-----

After a lot of troubleshooting it seems that it's the splitter/tokenizer that messes up the data. It looks correct after the convertBodyTo but doesn't look ok after the tokenizer statement.

Is the tokenizer doing anything here that I should be aware of?

Thanks
/Gustav

-----Original Message-----
From: Gustav Sinder [mailto:gustav.sinder@ferrologic.se]
Sent: den 2 juli 2015 09:57
To: users@camel.apache.org
Subject: Wrong charset when using FTP2 component, locale issue?

Hi,

I've got an issue with files being parsed differently in different environments...specifically handling Swedish characters.

The ftp endpoint is configured with:

-          charset=iso-8859-1 (matches file format)

-          binary=true

For debug purposes, I'm writing the data (in UTF-8) from a java bean, my local environment correctly outputs (hex) c3b6 for 'รถ'.
Our test environment outputs (hex) efbfbdefbfbd which is clearly based on erroneously parsed data.

Since the deployed code/test files is identical, is this an issue with Camel and the underlying system/locale?
I'm using Apache Camel 2.12.0.redhat-610379 (as part of JBoss Fuse).

My local (Linux) environment uses locale UTF-8:
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=en_US.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=

Our test (Linux) environment  uses POSIX:
LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

Thanks
/Gustav