You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@camel.apache.org by Kevin Jackson <fo...@gmail.com> on 2010/03/24 16:25:22 UTC

FileConsumer always reads the data in system charset/encoding

Hi,

I have a camel application deployed on RHEL5 with a default
encoding/locale of UTF-8

I have to download data from a remote Windows server (CP1251 or
ISO-8859-1/latin-1)

My route breaks down the processing of the files into two steps:
1 - download
2 - consume and pass split/tokenized String/bytes to POJOs for further
processing.

My problem stems from the fact that I don't seem to have control over
the charset that the FileConsumer uses as it converts the file into a
String.  The data contains encoded chars which are corrupted if the
data is read as UTF-8 instead of as ISO-8859-1.

I have a simple test case of a file encoded as ISO-8859-1 and I can
read it with a specific charset and this allows me to process the data
without corruption.  If I read it as UTF-8, the data is corrupted.

Is there any way I can instruct each of my FileConsumer endpoints to
consume the file using a specific charset/encoding?  I cannot change
the locale on the server to fix this as other files must be read as
UTF-8, not ISO-8859-1

I've looked at the camel source code and the way that camel consumes
files seems to rely on some kind of type coercion:

in FileBinding:
    public void loadContent(Exchange exchange, GenericFile<File> file)
throws IOException {
        try {
            content =
exchange.getContext().getTypeConverter().mandatoryConvertTo(byte[].class,
file.getFile());
        } catch (NoTypeConversionAvailableException e) {
            throw IOHelper.createIOException("Cannot load file
content: " + file.getAbsoluteFilePath(), e);
        }
    }

Is this the code that actually consumes the file and creates the
message, or should I be looking elsewhere?  I'm trying to add a
property to GenericFileEndpoint that will allow me to set a parameter
via the uri :
file://target/encoded/?charsetEncoding=ISO-8859-1

Thanks,
Kev

Re: FileConsumer always reads the data in system charset/encoding

Posted by Kevin Jackson <fo...@gmail.com>.
Hi,

> There used to be a issue in Camel with using CGLib proxied Spring
> beans, but I recall that was fixed in 2.2 as well.
> And that issue was only when you used Camel @annotations which causes
> them to not work as expected.
> Spring @Transactional is to my knowledge working.
>
> Maybe the Spring log can help. It logs something about bean XXX not
> eligible for bean post processing, which is the process
> Spring does when applying @ annotations and whatnot.
>

I'm not sure what the problem is but my unit test inserts some
reference data in the setUp() which is correctly rolled back at the
end of the test, the camel route persists data as well and it seems to
be occurring in a different transaction to that which the unit test is
running in.  I also have to do the whole Thread.sleep(2000) to give
the processing enough time to move the file/consume and process it
before testing that the data is correctly persisted.  So there are
obviously (at least) 2 threads.

As this is a 'final' integration test before I put the code into
production, I'm not too bothered that it doesn't behave as I would
expect it, but it did throw up false negatives which caused me to be
looking in the camel src code

>
>> Thanks for the help and I will not post unless I have exhausted the
>> other possibilities.
>> Kev
>>
>
> Maybe I went a bit overboard. But the point is that people should
> refrain from posting 2 min after they hit an issue.
> And instead spend some time looking into the issue. And trying out
> with latests releases is also preferred as it may very well be fixed.

Don't worry - I was out of line not putting in all the info required -
I was focusing on the platform and the default encodings rather than
on the version of camel.  I don't mind being confronted with my own
stupidity, it's the only way to learn sometimes ;)

My point about shelling out to iconv was more to do with how can camel
interact with native code.  I think a camel component that allows you
to exec command line code would be useful in certain circumstances -
camel-cli (based on the commons-cli code).

> with latests releases is also preferred as it may very well be fixed.

This is sometimes not possible with production code (although it
doesn't apply to my particular case)

Thanks for your help,
Kev

Re: FileConsumer always reads the data in system charset/encoding

Posted by Claus Ibsen <cl...@gmail.com>.
On Wed, Mar 24, 2010 at 6:52 PM, Kevin Jackson <fo...@gmail.com> wrote:
> Hi,
>
> To clear this whole thing up - I can now report that everything is
> working as expected, the problem was (and still is) with Spring's
> transactional test support, not rolling back the data that my
> processing is adding to the database.  This meant that I was checking
> against an old value in the db and getting a test failure - the
> correct value was also in the db.
>
> So the problem has moved from Camel to Spring @Transactional, which is
> quite interesting as it works for other tests, just not this
> particular one.
>

There used to be a issue in Camel with using CGLib proxied Spring
beans, but I recall that was fixed in 2.2 as well.
And that issue was only when you used Camel @annotations which causes
them to not work as expected.
Spring @Transactional is to my knowledge working.

Maybe the Spring log can help. It logs something about bean XXX not
eligible for bean post processing, which is the process
Spring does when applying @ annotations and whatnot.


> Thanks for the help and I will not post unless I have exhausted the
> other possibilities.
> Kev
>

Maybe I went a bit overboard. But the point is that people should
refrain from posting 2 min after they hit an issue.
And instead spend some time looking into the issue. And trying out
with latests releases is also preferred as it may very well be fixed.



-- 
Claus Ibsen
Apache Camel Committer

Author of Camel in Action: http://www.manning.com/ibsen/
Open Source Integration: http://fusesource.com
Blog: http://davsclaus.blogspot.com/
Twitter: http://twitter.com/davsclaus

Re: FileConsumer always reads the data in system charset/encoding

Posted by Kevin Jackson <fo...@gmail.com>.
Hi,

To clear this whole thing up - I can now report that everything is
working as expected, the problem was (and still is) with Spring's
transactional test support, not rolling back the data that my
processing is adding to the database.  This meant that I was checking
against an old value in the db and getting a test failure - the
correct value was also in the db.

So the problem has moved from Camel to Spring @Transactional, which is
quite interesting as it works for other tests, just not this
particular one.

Thanks for the help and I will not post unless I have exhausted the
other possibilities.
Kev

Re: FileConsumer always reads the data in system charset/encoding

Posted by Kevin Jackson <fo...@gmail.com>.
Hi,

> No it does not read the file beforehand.
>
> Use Tracer to see the Message Body.
> And you have not stated which version of Camel you are using, despite
> its highlighted on the how to get support page
> http://camel.apache.org/support.html
>
> Also you should check JIRA etc if there was a known issue with it
> For example:
> https://issues.apache.org/activemq/browse/CAMEL-2387

I am currently using 2.1, I will upgrade to 2.2 and see if this
resolves the issue for me

Sorry to post without following the 'How to ask sensible questions' rule.

/me contrite
Kev

Re: FileConsumer always reads the data in system charset/encoding

Posted by Claus Ibsen <cl...@gmail.com>.
No it does not read the file beforehand.

Use Tracer to see the Message Body.
And you have not stated which version of Camel you are using, despite
its highlighted on the how to get support page
http://camel.apache.org/support.html

Also you should check JIRA etc if there was a known issue with it
For example:
https://issues.apache.org/activemq/browse/CAMEL-2387


On Wed, Mar 24, 2010 at 5:37 PM, Kevin Jackson <fo...@gmail.com> wrote:
> Hi,
>
> Here is my route:
>
> <camel:camelContext xmlns="http://camel.apache.org/schema/spring">
>        <endpoint id="fileConsumer"
>
> uri="file:///tmp?preMove=.inprogress&amp;move=.done&amp;moveFailed=.error&amp;delay=400&amp;noop=false"/>
>
>        <camel:route id="integration-start" startupOrder="1">
>            <camel:from uri="direct:start"/>
>            <camel:to uri="file:///tmp"/>
>        </camel:route>
>
>        <camel:route id="test-route" startupOrder="2">
>            <camel:from ref="fileConsumer"/>
>            <camel:convertBodyTo type="java.lang.String" charset="UTF-8"/>
>            <camel:split streaming="true">
>                <camel:tokenize token="\*\*\n" regex="true"/>
>                <camel:to uri="direct:out"/>
>            </camel:split>
>        </camel:route>
>
>        <camel:route id="integration-end" startupOrder="3">
>            <camel:from uri="direct:out"/>
>            <camel:to uri="mock:result"/>
>        </camel:route>
>    </camel:camelContext>
>
> The file is encoded as iso-8859-1, the locale on the system is utf-8
>
> I think the file consumer is reading the data incorrectly *before* I
> can convert it with <convertBodyTo>, the FileConsumer is reading the
> data as UTF-8 when it should be reading as ISO-8859-1
>
> Annoyingly iconv -f ISO-8859-1 -t UTF-8 <file> works perfectly and
> converts the characters correctly - I really don't want to have to
> shell out to iconv to perform conversion before consuming the file,
> but at the moment it seems to be the sanest way of dealing with this
> problem
>
> Thanks,
> Kev
>



-- 
Claus Ibsen
Apache Camel Committer

Author of Camel in Action: http://www.manning.com/ibsen/
Open Source Integration: http://fusesource.com
Blog: http://davsclaus.blogspot.com/
Twitter: http://twitter.com/davsclaus

Re: FileConsumer always reads the data in system charset/encoding

Posted by Kevin Jackson <fo...@gmail.com>.
Hi,

Here is my route:

<camel:camelContext xmlns="http://camel.apache.org/schema/spring">
        <endpoint id="fileConsumer"

uri="file:///tmp?preMove=.inprogress&amp;move=.done&amp;moveFailed=.error&amp;delay=400&amp;noop=false"/>

        <camel:route id="integration-start" startupOrder="1">
            <camel:from uri="direct:start"/>
            <camel:to uri="file:///tmp"/>
        </camel:route>

        <camel:route id="test-route" startupOrder="2">
            <camel:from ref="fileConsumer"/>
            <camel:convertBodyTo type="java.lang.String" charset="UTF-8"/>
            <camel:split streaming="true">
                <camel:tokenize token="\*\*\n" regex="true"/>
                <camel:to uri="direct:out"/>
            </camel:split>
        </camel:route>

        <camel:route id="integration-end" startupOrder="3">
            <camel:from uri="direct:out"/>
            <camel:to uri="mock:result"/>
        </camel:route>
    </camel:camelContext>

The file is encoded as iso-8859-1, the locale on the system is utf-8

I think the file consumer is reading the data incorrectly *before* I
can convert it with <convertBodyTo>, the FileConsumer is reading the
data as UTF-8 when it should be reading as ISO-8859-1

Annoyingly iconv -f ISO-8859-1 -t UTF-8 <file> works perfectly and
converts the characters correctly - I really don't want to have to
shell out to iconv to perform conversion before consuming the file,
but at the moment it seems to be the sanest way of dealing with this
problem

Thanks,
Kev

Re: FileConsumer always reads the data in system charset/encoding

Posted by Kevin Jackson <fo...@gmail.com>.
Hi,

> Use
> .convertBodyTo(String.class, "utf-8") after the from(file:xxx) to
> control the charset used for encoding.

Fantastic - should have asked earlier before digging through the src -
C'est la vie

Kev

Re: FileConsumer always reads the data in system charset/encoding

Posted by Claus Ibsen <cl...@gmail.com>.
Hi

Use
.convertBodyTo(String.class, "utf-8") after the from(file:xxx) to
control the charset used for encoding.



On Wed, Mar 24, 2010 at 4:25 PM, Kevin Jackson <fo...@gmail.com> wrote:
> Hi,
>
> I have a camel application deployed on RHEL5 with a default
> encoding/locale of UTF-8
>
> I have to download data from a remote Windows server (CP1251 or
> ISO-8859-1/latin-1)
>
> My route breaks down the processing of the files into two steps:
> 1 - download
> 2 - consume and pass split/tokenized String/bytes to POJOs for further
> processing.
>
> My problem stems from the fact that I don't seem to have control over
> the charset that the FileConsumer uses as it converts the file into a
> String.  The data contains encoded chars which are corrupted if the
> data is read as UTF-8 instead of as ISO-8859-1.
>
> I have a simple test case of a file encoded as ISO-8859-1 and I can
> read it with a specific charset and this allows me to process the data
> without corruption.  If I read it as UTF-8, the data is corrupted.
>
> Is there any way I can instruct each of my FileConsumer endpoints to
> consume the file using a specific charset/encoding?  I cannot change
> the locale on the server to fix this as other files must be read as
> UTF-8, not ISO-8859-1
>
> I've looked at the camel source code and the way that camel consumes
> files seems to rely on some kind of type coercion:
>
> in FileBinding:
>    public void loadContent(Exchange exchange, GenericFile<File> file)
> throws IOException {
>        try {
>            content =
> exchange.getContext().getTypeConverter().mandatoryConvertTo(byte[].class,
> file.getFile());
>        } catch (NoTypeConversionAvailableException e) {
>            throw IOHelper.createIOException("Cannot load file
> content: " + file.getAbsoluteFilePath(), e);
>        }
>    }
>
> Is this the code that actually consumes the file and creates the
> message, or should I be looking elsewhere?  I'm trying to add a
> property to GenericFileEndpoint that will allow me to set a parameter
> via the uri :
> file://target/encoded/?charsetEncoding=ISO-8859-1
>
> Thanks,
> Kev
>



-- 
Claus Ibsen
Apache Camel Committer

Author of Camel in Action: http://www.manning.com/ibsen/
Open Source Integration: http://fusesource.com
Blog: http://davsclaus.blogspot.com/
Twitter: http://twitter.com/davsclaus