You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Suresh Saggar (JIRA)" <ji...@apache.org> on 2012/11/01 20:05:13 UTC
[jira] [Created] (FLUME-1676) ExecSource should provide a
configurable charset
Suresh Saggar created FLUME-1676:
------------------------------------
Summary: ExecSource should provide a configurable charset
Key: FLUME-1676
URL: https://issues.apache.org/jira/browse/FLUME-1676
Project: Flume
Issue Type: Bug
Environment: :~/apache-flume-1.4.0-SNAPSHOT/conf# ../bin/flume-ng version
Flume 1.4.0-SNAPSHOT
Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
Revision: 831a86fc5501a8624b184ea65e53749df31692b8
Compiled by jenkins on Tue Oct 30 03:18:08 UTC 2012
>From source with checksum 98685e32b9e500a2305f538b4468faaa
Reporter: Suresh Saggar
The character set is currently not configurable in the exec source - http://flume.apache.org/FlumeUserGuide.html#exec-source
File - https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java
Can somebody please expose the ability to specify character set in the exec source?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (FLUME-1676) ExecSource should provide a
configurable charset
Posted by "Nitin Verma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/FLUME-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494863#comment-13494863 ]
Nitin Verma commented on FLUME-1676:
------------------------------------
Fix was uploaded on 3rd Nov, I am waiting for comments.
> ExecSource should provide a configurable charset
> ------------------------------------------------
>
> Key: FLUME-1676
> URL: https://issues.apache.org/jira/browse/FLUME-1676
> Project: Flume
> Issue Type: Bug
> Affects Versions: notrack
> Environment: :~/apache-flume-1.4.0-SNAPSHOT/conf# ../bin/flume-ng version
> Flume 1.4.0-SNAPSHOT
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: 831a86fc5501a8624b184ea65e53749df31692b8
> Compiled by jenkins on Tue Oct 30 03:18:08 UTC 2012
> From source with checksum 98685e32b9e500a2305f538b4468faaa
> Reporter: Suresh Saggar
> Labels: patch
> Attachments: flume-1676.patch
>
>
> The character set is currently not configurable in the exec source - http://flume.apache.org/FlumeUserGuide.html#exec-source
> File - https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java
> Can somebody please expose the ability to specify character set in the exec source?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (FLUME-1676) ExecSource should provide a
configurable charset
Posted by "Suresh Saggar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/FLUME-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488948#comment-13488948 ]
Suresh Saggar commented on FLUME-1676:
--------------------------------------
I was trying to setup a FLumeNG multi-tier workflow with agent01 running on some webserver using exec source & avro sink and other agent02 running on some webserver (a collector) using avro source and hdfs sink.
Configuration file - https://gist.github.com/3993648
Although the data (here tail output) was getting written to hdfs, but when i cat the file I can see some formatting issues. Link to the HDFS output depicting formatting issue - https://gist.github.com/3995476
> ExecSource should provide a configurable charset
> ------------------------------------------------
>
> Key: FLUME-1676
> URL: https://issues.apache.org/jira/browse/FLUME-1676
> Project: Flume
> Issue Type: Bug
> Environment: :~/apache-flume-1.4.0-SNAPSHOT/conf# ../bin/flume-ng version
> Flume 1.4.0-SNAPSHOT
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: 831a86fc5501a8624b184ea65e53749df31692b8
> Compiled by jenkins on Tue Oct 30 03:18:08 UTC 2012
> From source with checksum 98685e32b9e500a2305f538b4468faaa
> Reporter: Suresh Saggar
>
> The character set is currently not configurable in the exec source - http://flume.apache.org/FlumeUserGuide.html#exec-source
> File - https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java
> Can somebody please expose the ability to specify character set in the exec source?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (FLUME-1676) ExecSource should provide a
configurable charset
Posted by "Nitin Verma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/FLUME-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489312#comment-13489312 ]
Nitin Verma commented on FLUME-1676:
------------------------------------
There are few questions around this request.
Before that I would like to explain a bit about two charsets under consideration.
Suppose we need to write a²=¼b in ISO-8859-1 (http://en.wikipedia.org/wiki/ISO/IEC_8859-1).
1. a,b,= fall in ASCII range, thus you can type
2. ² = B2, ¼ = BC in hex.
$ awk ' BEGIN { printf "a%s=%sb\n", "\xB2", "\xBC" } '
a�=�b
Note: If this shows up as a²=¼b, then you are on ISO-8859-1.
Now let us encode the same in UTF-8 (http://en.wikipedia.org/wiki/UTF-8)
Char. number range | UTF-8 octet sequence
(hexadecimal) | (binary)
--------------------+---------------------------------------------
0000 0000-0000 007F | 0xxxxxxx
0000 0080-0000 07FF | 110xxxxx 10xxxxxx
0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
and so on so forth
The hex values for the chars is same in UTF-8 but it has to be encoded it is not a single byte charset (² = B2, ¼ = BC )
As B2 & BC > 7F and < 0800, it would be encoded in two bytes (110xxxxx 10xxxxxx)
B2 => 1011 0010 => 1100 0010 1011 0010 => C2 B2
B2 => 1011 1100 => 1100 0010 1011 1100 => C2 BC
$ awk ' BEGIN { printf "a%s=%sb\n", "\xC2\xB2", "\xC2\xBC" } '
a²=¼b
Note: If this shows up as a²=¼b, then you are on ISO-8859-1.
iconv tries to makes sure it translates bytes in such a way that from-charset is visible on to-charset terminal.
Thus it would add C2, if I do the following.
$ awk ' BEGIN { printf "a%s=%sb\n", "\xB2", "\xBC" } ' | iconv -f "ISO-8859-1" -t "UTF-8"
a²=¼b
Warning:
There are many charsets around and not all charsets support all the characters. Thereby Byte translation is a lossy business. Example below:-
$ awk ' BEGIN { print "\xE0\xA5\x90" } ' | iconv -f "UTF-8" -t "ISO-8859-1"
iconv: illegal input sequence at position 0
Considering all above, I feel
Flume should concentrate on transferring byte to byte from one system to another, not translating. If the charset of two systems is different, then
source system: cat $file
sink system: cat $file | iconv -f source-charset -t sink-charset
should show the same visible output, till sink-charset defines all the characters defined in source-charset.
** Only guarantee flume should give is bytes transferred on sink are the same as the bytes given via the source **
> ExecSource should provide a configurable charset
> ------------------------------------------------
>
> Key: FLUME-1676
> URL: https://issues.apache.org/jira/browse/FLUME-1676
> Project: Flume
> Issue Type: Bug
> Environment: :~/apache-flume-1.4.0-SNAPSHOT/conf# ../bin/flume-ng version
> Flume 1.4.0-SNAPSHOT
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: 831a86fc5501a8624b184ea65e53749df31692b8
> Compiled by jenkins on Tue Oct 30 03:18:08 UTC 2012
> From source with checksum 98685e32b9e500a2305f538b4468faaa
> Reporter: Suresh Saggar
>
> The character set is currently not configurable in the exec source - http://flume.apache.org/FlumeUserGuide.html#exec-source
> File - https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java
> Can somebody please expose the ability to specify character set in the exec source?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (FLUME-1676) ExecSource should provide a
configurable charset
Posted by "Roshan Naik (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/FLUME-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509367#comment-13509367 ]
Roshan Naik commented on FLUME-1676:
------------------------------------
would be nice to have it on review board.
> ExecSource should provide a configurable charset
> ------------------------------------------------
>
> Key: FLUME-1676
> URL: https://issues.apache.org/jira/browse/FLUME-1676
> Project: Flume
> Issue Type: Bug
> Affects Versions: notrack
> Environment: :~/apache-flume-1.4.0-SNAPSHOT/conf# ../bin/flume-ng version
> Flume 1.4.0-SNAPSHOT
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: 831a86fc5501a8624b184ea65e53749df31692b8
> Compiled by jenkins on Tue Oct 30 03:18:08 UTC 2012
> From source with checksum 98685e32b9e500a2305f538b4468faaa
> Reporter: Suresh Saggar
> Labels: patch
> Attachments: flume-1676.patch
>
>
> The character set is currently not configurable in the exec source - http://flume.apache.org/FlumeUserGuide.html#exec-source
> File - https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java
> Can somebody please expose the ability to specify character set in the exec source?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (FLUME-1676) ExecSource should provide a
configurable charset
Posted by "Mike Percy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/FLUME-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488942#comment-13488942 ]
Mike Percy commented on FLUME-1676:
-----------------------------------
Talked to Suresh about this on IRC. Example would be using exec source with a tail -F command on a file that is ISO-8859 encoded.
> ExecSource should provide a configurable charset
> ------------------------------------------------
>
> Key: FLUME-1676
> URL: https://issues.apache.org/jira/browse/FLUME-1676
> Project: Flume
> Issue Type: Bug
> Environment: :~/apache-flume-1.4.0-SNAPSHOT/conf# ../bin/flume-ng version
> Flume 1.4.0-SNAPSHOT
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: 831a86fc5501a8624b184ea65e53749df31692b8
> Compiled by jenkins on Tue Oct 30 03:18:08 UTC 2012
> From source with checksum 98685e32b9e500a2305f538b4468faaa
> Reporter: Suresh Saggar
>
> The character set is currently not configurable in the exec source - http://flume.apache.org/FlumeUserGuide.html#exec-source
> File - https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java
> Can somebody please expose the ability to specify character set in the exec source?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (FLUME-1676) ExecSource should provide a
configurable charset
Posted by "Nitin Verma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/FLUME-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nitin Verma updated FLUME-1676:
-------------------------------
Attachment: flume-1676.patch
attaching the patch please review.
> ExecSource should provide a configurable charset
> ------------------------------------------------
>
> Key: FLUME-1676
> URL: https://issues.apache.org/jira/browse/FLUME-1676
> Project: Flume
> Issue Type: Bug
> Affects Versions: notrack
> Environment: :~/apache-flume-1.4.0-SNAPSHOT/conf# ../bin/flume-ng version
> Flume 1.4.0-SNAPSHOT
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: 831a86fc5501a8624b184ea65e53749df31692b8
> Compiled by jenkins on Tue Oct 30 03:18:08 UTC 2012
> From source with checksum 98685e32b9e500a2305f538b4468faaa
> Reporter: Suresh Saggar
> Labels: patch
> Attachments: flume-1676.patch
>
>
> The character set is currently not configurable in the exec source - http://flume.apache.org/FlumeUserGuide.html#exec-source
> File - https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java
> Can somebody please expose the ability to specify character set in the exec source?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (FLUME-1676) ExecSource should provide a
configurable charset
Posted by "Nitin Verma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/FLUME-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489504#comment-13489504 ]
Nitin Verma commented on FLUME-1676:
------------------------------------
Hi Mike,
I did some testing on constructing java strings using iso-8859-1 bytes. As java string translates from given bytes to UTF-16, if charset is not correct then it is lossy. (default is UTF-8)
For flume we should ingest and egest bytes from strings using the charset so that channel get the same bytes as user source had, likewise the sink.
string = new String(bytes, charset);
string.getBytes(charset);
TODO: I would do similar tests on streams.
Java Test Code
{code:java}
package edu.nitin.testcodes;
import java.nio.charset.Charset;
import org.testng.annotations.Test;
public class CharsetTest {
@Test
public void testCharset() {
final byte[] bytes = new byte[]{(byte) 0x40, (byte) 0xC2, (byte) 0xE6,(byte) 0x40};
final Charset charset = Charset.forName("ISO-8859-1");
System.out.println("Input bytes");
print(bytes);
System.out.println("ingest using charset");
{
final String string = new String(bytes, charset);
System.out.println(string);
print(string.getBytes());
print(string.getBytes(charset));
}
System.out.println("ingest without using charset");
{
final String string = new String(bytes);
System.out.println(string);
print(string.getBytes());
print(string.getBytes(charset));
}
}
private void print(final byte bytes[]) {
for (byte b : bytes) {
System.out.printf(" %02X", b);
}
System.out.println();
}
}
{code}
Output
{code}
Input bytes
40 C2 E6 40
ingest using charset
@Âæ@
40 C3 82 C3 A6 40
40 C2 E6 40
ingest without using charset
@��
40 EF BF BD EF BF BD
40 3F 3F
{code}
> ExecSource should provide a configurable charset
> ------------------------------------------------
>
> Key: FLUME-1676
> URL: https://issues.apache.org/jira/browse/FLUME-1676
> Project: Flume
> Issue Type: Bug
> Environment: :~/apache-flume-1.4.0-SNAPSHOT/conf# ../bin/flume-ng version
> Flume 1.4.0-SNAPSHOT
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: 831a86fc5501a8624b184ea65e53749df31692b8
> Compiled by jenkins on Tue Oct 30 03:18:08 UTC 2012
> From source with checksum 98685e32b9e500a2305f538b4468faaa
> Reporter: Suresh Saggar
>
> The character set is currently not configurable in the exec source - http://flume.apache.org/FlumeUserGuide.html#exec-source
> File - https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java
> Can somebody please expose the ability to specify character set in the exec source?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (FLUME-1676) ExecSource should provide a
configurable charset
Posted by "Nitin Verma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/FLUME-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489544#comment-13489544 ]
Nitin Verma commented on FLUME-1676:
------------------------------------
Hi Mike,
InputStreamReader needs to know the charset else readLine just messes it up.
bufferedReader = new BufferedReader(new InputStreamReader(byteArrayInputStream, charset));
bufferedReader.readLine().getBytes(charset);
{code:java}
package edu.nitin.testcodes;
import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.nio.charset.Charset;
import org.testng.annotations.Test;
public class CharsetStreamTest {
@Test
public void testCharset() throws IOException {
final byte[] bytes = new byte[]{
(byte) 0x40, (byte) 0xC2, (byte) 0xE6, (byte) 0x40, (byte) '\n',
(byte) 0x41, (byte) 0xC2, (byte) 0xE6, (byte) 0x40, (byte) '\n',
(byte) 0x42, (byte) 0xC2, (byte) 0xE6, (byte) 0x40, (byte) '\n',
(byte) 0x43, (byte) 0xC2, (byte) 0xE6, (byte) 0x40, (byte) '\n',
(byte) 0x44, (byte) 0xC2, (byte) 0xE6, (byte) 0x40
};
final Charset charset = Charset.forName("ISO-8859-1");
System.out.println("Input bytes");
print(bytes);
System.out.println("ingest using charset");
{
final ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes);
final BufferedReader bufferedReader = new BufferedReader(
new InputStreamReader(byteArrayInputStream, charset));
String line;
while ((line = bufferedReader.readLine()) != null) {
print(line.getBytes(charset));
}
}
System.out.println("ingest without using charset");
{
final ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes);
final BufferedReader bufferedReader = new BufferedReader(
new InputStreamReader(byteArrayInputStream));
String line;
while ((line = bufferedReader.readLine()) != null) {
print(line.getBytes(charset));
}
}
}
private void print(final byte bytes[]) {
for (byte b : bytes) {
System.out.printf(" %02X", b);
}
System.out.println();
}
}
{code}
{code}
Input bytes
40 C2 E6 40 0A 41 C2 E6 40 0A 42 C2 E6 40 0A 43 C2 E6 40 0A 44 C2 E6 40
ingest using charset
40 C2 E6 40
41 C2 E6 40
42 C2 E6 40
43 C2 E6 40
44 C2 E6 40
ingest without using charset
40 3F 3F 40
41 3F 3F 40
42 3F 3F 40
43 3F 3F 40
44 3F 3F
{code}
> ExecSource should provide a configurable charset
> ------------------------------------------------
>
> Key: FLUME-1676
> URL: https://issues.apache.org/jira/browse/FLUME-1676
> Project: Flume
> Issue Type: Bug
> Environment: :~/apache-flume-1.4.0-SNAPSHOT/conf# ../bin/flume-ng version
> Flume 1.4.0-SNAPSHOT
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: 831a86fc5501a8624b184ea65e53749df31692b8
> Compiled by jenkins on Tue Oct 30 03:18:08 UTC 2012
> From source with checksum 98685e32b9e500a2305f538b4468faaa
> Reporter: Suresh Saggar
>
> The character set is currently not configurable in the exec source - http://flume.apache.org/FlumeUserGuide.html#exec-source
> File - https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java
> Can somebody please expose the ability to specify character set in the exec source?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (FLUME-1676) ExecSource should provide a
configurable charset
Posted by "Mike Percy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/FLUME-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489324#comment-13489324 ]
Mike Percy commented on FLUME-1676:
-----------------------------------
Nitin: That is the guarantee Flume provides. I believe the request is the following:
1. Provide a way to specify the charset that is provided on the terminal to Flume, so it knows how to decode it into a String.
2. Provide a way to specify the charset we will store in the Flume Event object itself, when we encode the String into binary.
Without specifying these things, the user has no control over how Flume interprets his data.
> ExecSource should provide a configurable charset
> ------------------------------------------------
>
> Key: FLUME-1676
> URL: https://issues.apache.org/jira/browse/FLUME-1676
> Project: Flume
> Issue Type: Bug
> Environment: :~/apache-flume-1.4.0-SNAPSHOT/conf# ../bin/flume-ng version
> Flume 1.4.0-SNAPSHOT
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: 831a86fc5501a8624b184ea65e53749df31692b8
> Compiled by jenkins on Tue Oct 30 03:18:08 UTC 2012
> From source with checksum 98685e32b9e500a2305f538b4468faaa
> Reporter: Suresh Saggar
>
> The character set is currently not configurable in the exec source - http://flume.apache.org/FlumeUserGuide.html#exec-source
> File - https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java
> Can somebody please expose the ability to specify character set in the exec source?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (FLUME-1676) ExecSource should provide a
configurable charset
Posted by "Mike Percy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/FLUME-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489324#comment-13489324 ]
Mike Percy edited comment on FLUME-1676 at 11/2/12 9:26 AM:
------------------------------------------------------------
Nitin: That is the guarantee Flume provides as a framework. I believe the request is the following:
1. Provide a way to specify the charset that is provided on the terminal to Flume, so that the Exec Source knows how to decode it into a String.
2. Provide a way to specify the charset we will store in the Flume Event object itself, when the Exec Source encodes the String into binary form using EventBuilder.
Without the capability to specify these encodings, a user doesn't have enough control over how the Exec Source interprets his text input data.
(Edit: clarifications)
was (Author: mpercy):
Nitin: That is the guarantee Flume provides. I believe the request is the following:
1. Provide a way to specify the charset that is provided on the terminal to Flume, so it knows how to decode it into a String.
2. Provide a way to specify the charset we will store in the Flume Event object itself, when we encode the String into binary.
Without specifying these things, the user has no control over how Flume interprets his data.
> ExecSource should provide a configurable charset
> ------------------------------------------------
>
> Key: FLUME-1676
> URL: https://issues.apache.org/jira/browse/FLUME-1676
> Project: Flume
> Issue Type: Bug
> Environment: :~/apache-flume-1.4.0-SNAPSHOT/conf# ../bin/flume-ng version
> Flume 1.4.0-SNAPSHOT
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: 831a86fc5501a8624b184ea65e53749df31692b8
> Compiled by jenkins on Tue Oct 30 03:18:08 UTC 2012
> From source with checksum 98685e32b9e500a2305f538b4468faaa
> Reporter: Suresh Saggar
>
> The character set is currently not configurable in the exec source - http://flume.apache.org/FlumeUserGuide.html#exec-source
> File - https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java
> Can somebody please expose the ability to specify character set in the exec source?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (FLUME-1676) ExecSource should provide a
configurable charset
Posted by "Nitin Verma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/FLUME-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489555#comment-13489555 ]
Nitin Verma commented on FLUME-1676:
------------------------------------
So there are two ways to deal these bytes
1. Do not use String/Reader, that is deal with InputStream/byte[].
2. Make String/Reader charset aware
> ExecSource should provide a configurable charset
> ------------------------------------------------
>
> Key: FLUME-1676
> URL: https://issues.apache.org/jira/browse/FLUME-1676
> Project: Flume
> Issue Type: Bug
> Environment: :~/apache-flume-1.4.0-SNAPSHOT/conf# ../bin/flume-ng version
> Flume 1.4.0-SNAPSHOT
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: 831a86fc5501a8624b184ea65e53749df31692b8
> Compiled by jenkins on Tue Oct 30 03:18:08 UTC 2012
> From source with checksum 98685e32b9e500a2305f538b4468faaa
> Reporter: Suresh Saggar
>
> The character set is currently not configurable in the exec source - http://flume.apache.org/FlumeUserGuide.html#exec-source
> File - https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java
> Can somebody please expose the ability to specify character set in the exec source?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira