You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@camel.apache.org by hefiso <he...@gmail.com> on 2012/12/04 13:29:07 UTC

Problem w. FTP producer and charset

Hi

It seems that the FTP producer is no longer able to write files using my
favorite charset "iso-8859-1". The latest version where it worked as
expected is 2.9.2 but from 2.9.3 and up the files are not written using the
specified charset.

I have attached a simple test showing this error. The route is:

<route id="foo-Route">
    <from uri="direct:in" />
    <convertBodyTo type="java.lang.String" charset="iso-8859-1" />
    <setHeader
headerName="CamelFileName"><constant>test_encoding.txt</constant></setHeader>
    <to
uri="ftp://test:test@127.0.0.1:10017/?charset=iso-8859-1&amp;disconnect=true&amp;stepwise=false"
/>
</route>

and the test goes like this:

	@Test
	public void testFtpFileEncoding() throws InterruptedException, IOException
{
		String payload = "<foo>Halløj</foo>";
		template.sendBody("direct:in", payload);
		Thread.sleep(2000);
		
		FileSystem fileSystem = fakeFtpServer.getFileSystem();
		FileEntry entry = (FileEntry)fileSystem.getEntry("/test_encoding.txt");
		InputStream inputStream = entry.createInputStream();

		StringWriter writer = new StringWriter();
		IOUtils.copy(inputStream, writer, "iso-8859-1");
		String res = writer.toString();
		
		assertEquals(payload, res);
	}

When using Apache Camel 2.9.2 this works well. When upgrading to 2.9.3 or
higher (have tested with 2.10.0-2.10.3 as well) it fails because the file
gets written using utf-8 and not ios-8859-1 as specified :-/ You can test
this by changing the camel version in pom.xml.

The problem is not related to the MockFtpServer used in the JUnit-test as
the problem also exists when writing files to "real" FTP servers.

Do you have a fix or workaround for this? ftp-producer-problem.zip
<http://camel.465427.n5.nabble.com/file/n5723604/ftp-producer-problem.zip>  

Best regards
Henrik




--
View this message in context: http://camel.465427.n5.nabble.com/Problem-w-FTP-producer-and-charset-tp5723604.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Problem w. FTP producer and charset

Posted by "Preben.Asmussen" <pr...@dr.dk>.
I created https://issues.apache.org/jira/browse/CAMEL-5881

Will work on a patch and attach it later.

Best regards
Preben



--
View this message in context: http://camel.465427.n5.nabble.com/Problem-w-FTP-producer-and-charset-tp5723604p5724182.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Problem w. FTP producer and charset

Posted by Claus Ibsen <cl...@gmail.com>.
On Tue, Dec 11, 2012 at 11:02 AM, hefiso
<he...@gmail.com> wrote:
> Hi Preben
>
> Thanks for the workaround! Indeed, using
>
> <setProperty
> propertyName="CHARSET_NAME"><constant>iso-8859-1</constant></setProperty>
>
> makes the FTP-producer respect the requested charset.
>
> However, I think the best solution is to make the FTP-component behave just
> like  File2 so it is possible to specify an encoding directly on the
> endpoint ("?charset=iso-8859-1").
>

Yeah its best if ftp/file works the same.
Feel free to log a JIRA.

And as always we love contributions
http://camel.apache.org/contributing.html


> Best Regards
> Henrik
>
>
>
> --
> View this message in context: http://camel.465427.n5.nabble.com/Problem-w-FTP-producer-and-charset-tp5723604p5723897.html
> Sent from the Camel - Users mailing list archive at Nabble.com.



-- 
Claus Ibsen
-----------------
Red Hat, Inc.
FuseSource is now part of Red Hat
Email: cibsen@redhat.com
Web: http://fusesource.com
Twitter: davsclaus
Blog: http://davsclaus.com
Author of Camel in Action: http://www.manning.com/ibsen

Re: Problem w. FTP producer and charset

Posted by hefiso <he...@gmail.com>.
Hi Preben

Thanks for the workaround! Indeed, using 

<setProperty
propertyName="CHARSET_NAME"><constant>iso-8859-1</constant></setProperty>

makes the FTP-producer respect the requested charset.

However, I think the best solution is to make the FTP-component behave just
like  File2 so it is possible to specify an encoding directly on the
endpoint ("?charset=iso-8859-1").

Best Regards
Henrik



--
View this message in context: http://camel.465427.n5.nabble.com/Problem-w-FTP-producer-and-charset-tp5723604p5723897.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Problem w. FTP producer and charset

Posted by "Preben.Asmussen" <pr...@dr.dk>.
I have digged a bit deeper.

The problem is that ftp files get's written in the wrong charset even if you
specify the charset to use on the ftp endpoint like
to("ftp://test:test@127.0.0.1:10017/?charset=iso-8859-1 or do a
convertBodyTo(String.class, "iso-8859-1") before sending to the ftp
endpoint.

The root probmel lies in FtpOperation.doStoreFile where the call to 
    if (is == null) {
        is = exchange.getIn().getMandatoryBody(InputStream.class);
    }
using DefaultTypeConverter.mandatoryConvertTo ends up calling
IOConverter.toInputStream which converts the body to the desired charset,
BUT relies on the charset from the Echange.properties as a CamelCharsetName
property.

The doStoreFile operation doesn't thake the charset endpoint property into
account when constructing the inputstream.


Up until camel 2.9.2 this worked because calling convertBodyTo(String,
"iso-8859-1") before sending to the ftp endpoint used ConvertBodyProcessor
that set the Exchange.CHARSET_NAME 

snip --

public void process(Exchange exchange) throws Exception {
        ....
        if (charset != null) {
            exchange.setProperty(Exchange.CHARSET_NAME,
IOHelper.normalizeCharset(charset));
        }
        ....

But from 2.9.3 there has been a change in the process method to
        ...
         // remove charset when we are done as we should not propagate that,
        // as that can lead to double converting later on
        if (charset != null) {
            exchange.removeProperty(Exchange.CHARSET_NAME);
        }
        ....

I think the FtpOperation.doStoreFile is missing the handeling of the charset
property like that implemented in the file component FileOperation.storeFile

A way to fix it is to change the doStoreFile ->

....
            if (is == null) {
                String charset = endpoint.getCharset();
                if (charset != null) {
                    is = new
ByteArrayInputStream(exchange.getIn().getBody(String.class).getBytes(charset));
                    log.trace("Using InputStream {} with charset {}.", is,
charset);
                } else {
                    is =
exchange.getIn().getMandatoryBody(InputStream.class);
                }
            }
....

But I guess that there is a more Camel way of converting the body to with
account to charset and content.

Best 

Preben




--
View this message in context: http://camel.465427.n5.nabble.com/Problem-w-FTP-producer-and-charset-tp5723604p5723793.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Problem w. FTP producer and charset

Posted by "Preben.Asmussen" <pr...@dr.dk>.
I added a bit more to the testcase in the attached zip 

When running it now completes ok on 2.9.2.

When I change the camel version in the pom to something newer like 2.9.3
..... and upwards it will fail since the CamelCharsetName is not present.

ftp-producer-bug.zip
<http://camel.465427.n5.nabble.com/file/n5723765/ftp-producer-bug.zip>  



--
View this message in context: http://camel.465427.n5.nabble.com/Problem-w-FTP-producer-and-charset-tp5723604p5723765.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Problem w. FTP producer and charset

Posted by "Preben.Asmussen" <pr...@dr.dk>.
Hi Willem

I think the testcase is attatched as a zip. just run it and you will see
that it fails under camel 2.9.3 and completes under 2.9.2 when you change
the camel.version in the pom.

The problem is that files don't get written in the correct charset even if
you specify the charset on the file/ftp endpoint eg. when the body is in
utf-8, and you want to write a file in iso-8859-1.

The root problem is that the exchange property CamelCharsetName is not set
on version 2.9.3 and upwards even though it is set on the file/ftp producer
endpoint as a charset property.

At the moment a workaround is to set the CamelCharsetName property manually.

Best
Preben Asmussen





--
View this message in context: http://camel.465427.n5.nabble.com/Problem-w-FTP-producer-and-charset-tp5723604p5723751.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Problem w. FTP producer and charset

Posted by Willem jiang <wi...@gmail.com>.
Can you tell us what's the issue that you meet?
I don't quite understand you question.
It could be more easy if you can show us a small test case.


--  
Willem Jiang

Red Hat, Inc.
FuseSource is now part of Red Hat
Web: http://www.fusesource.com | http://www.redhat.com
Blog: http://willemjiang.blogspot.com (http://willemjiang.blogspot.com/) (English)
          http://jnn.iteye.com (http://jnn.javaeye.com/) (Chinese)
Twitter: willemjiang  
Weibo: 姜宁willem





On Friday, December 7, 2012 at 4:38 AM, Preben.Asmussen wrote:

> Hi
>  
> Located this to be a problem introduced between 2.9.2 and 2.9.3 as a missing
> CamelCharsetName property on the Exchange.  
> The CamelCharsetName property is not set after 2.9.2, and when
> IOConverter.toInputStream is called to convert exchange body to inputstream
> it uses IOHelper to get the CharsetName for the stream construction.  
>  
> Since it is not set as a property on the Exchange it uses the fallback
> method getDefaultCharsetName witch returns the platform default charset.
>  
> Seems there has been a change on 2.9.3 since documentation of charset File2
> property states :
> Camel 2.9.3: this option is used to specify the encoding of the file, and
> camel will set the Exchange property with Exchange.CHARSET_NAME with the
> value of this option. You can use this on the consumer, to specify the
> encodings of the files, which allow Camel to know the charset it should load
> the file content in case the file content is being accessed. Likewise when
> writing a file, you can use this option to specify which charset to write
> the file as well. See further below for a examples and more important
> details.
>  
>  
> This must be a bug. Should I raise a jira ?
>  
> /Preben
>  
>  
>  
> --
> View this message in context: http://camel.465427.n5.nabble.com/Problem-w-FTP-producer-and-charset-tp5723604p5723722.html
> Sent from the Camel - Users mailing list archive at Nabble.com (http://Nabble.com).




Re: Problem w. FTP producer and charset

Posted by "Preben.Asmussen" <pr...@dr.dk>.
Hi

Located this to be a problem introduced between 2.9.2 and 2.9.3 as a missing
CamelCharsetName property on the Exchange. 
The CamelCharsetName property is not set after 2.9.2, and when
IOConverter.toInputStream is called to convert exchange body to inputstream
it uses IOHelper to get the CharsetName for the stream construction. 

Since it is not set as a property on the Exchange it uses the fallback
method getDefaultCharsetName witch returns the platform default charset.

Seems there has been a change on 2.9.3 since documentation of charset File2
property states :
Camel 2.9.3: this option is used to specify the encoding of the file, and
camel will set the Exchange property with Exchange.CHARSET_NAME with the
value of this option. You can use this on the consumer, to specify the
encodings of the files, which allow Camel to know the charset it should load
the file content in case the file content is being accessed. Likewise when
writing a file, you can use this option to specify which charset to write
the file as well. See further below for a examples and more important
details.


This must be a bug. Should I raise a jira ?

/Preben



--
View this message in context: http://camel.465427.n5.nabble.com/Problem-w-FTP-producer-and-charset-tp5723604p5723722.html
Sent from the Camel - Users mailing list archive at Nabble.com.