You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Suresh Saggar (JIRA)" <ji...@apache.org> on 2012/11/01 20:05:13 UTC

[jira] [Created] (FLUME-1676) ExecSource should provide a configurable charset

Suresh Saggar created FLUME-1676:
------------------------------------

             Summary: ExecSource should provide a configurable charset
                 Key: FLUME-1676
                 URL: https://issues.apache.org/jira/browse/FLUME-1676
             Project: Flume
          Issue Type: Bug
         Environment: :~/apache-flume-1.4.0-SNAPSHOT/conf# ../bin/flume-ng version
Flume 1.4.0-SNAPSHOT
Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
Revision: 831a86fc5501a8624b184ea65e53749df31692b8
Compiled by jenkins on Tue Oct 30 03:18:08 UTC 2012
>From source with checksum 98685e32b9e500a2305f538b4468faaa
            Reporter: Suresh Saggar


The character set is currently not configurable in the exec source - http://flume.apache.org/FlumeUserGuide.html#exec-source

File - https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java

Can somebody please expose the ability to specify character set in the exec source?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-1676) ExecSource should provide a configurable charset

Posted by "Nitin Verma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494863#comment-13494863 ] 

Nitin Verma commented on FLUME-1676:
------------------------------------

Fix was uploaded on 3rd Nov, I am waiting for comments.
                
> ExecSource should provide a configurable charset
> ------------------------------------------------
>
>                 Key: FLUME-1676
>                 URL: https://issues.apache.org/jira/browse/FLUME-1676
>             Project: Flume
>          Issue Type: Bug
>    Affects Versions: notrack
>         Environment: :~/apache-flume-1.4.0-SNAPSHOT/conf# ../bin/flume-ng version
> Flume 1.4.0-SNAPSHOT
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: 831a86fc5501a8624b184ea65e53749df31692b8
> Compiled by jenkins on Tue Oct 30 03:18:08 UTC 2012
> From source with checksum 98685e32b9e500a2305f538b4468faaa
>            Reporter: Suresh Saggar
>              Labels: patch
>         Attachments: flume-1676.patch
>
>
> The character set is currently not configurable in the exec source - http://flume.apache.org/FlumeUserGuide.html#exec-source
> File - https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java
> Can somebody please expose the ability to specify character set in the exec source?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-1676) ExecSource should provide a configurable charset

Posted by "Suresh Saggar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488948#comment-13488948 ] 

Suresh Saggar commented on FLUME-1676:
--------------------------------------

I was trying to setup a FLumeNG multi-tier workflow with agent01 running on some webserver using exec source & avro sink and other agent02 running on some webserver (a collector) using avro source and hdfs sink. 

Configuration file - https://gist.github.com/3993648

Although the data (here tail output) was getting written to hdfs, but when i cat the file I can see some formatting issues. Link to the HDFS output depicting formatting issue - https://gist.github.com/3995476

                
> ExecSource should provide a configurable charset
> ------------------------------------------------
>
>                 Key: FLUME-1676
>                 URL: https://issues.apache.org/jira/browse/FLUME-1676
>             Project: Flume
>          Issue Type: Bug
>         Environment: :~/apache-flume-1.4.0-SNAPSHOT/conf# ../bin/flume-ng version
> Flume 1.4.0-SNAPSHOT
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: 831a86fc5501a8624b184ea65e53749df31692b8
> Compiled by jenkins on Tue Oct 30 03:18:08 UTC 2012
> From source with checksum 98685e32b9e500a2305f538b4468faaa
>            Reporter: Suresh Saggar
>
> The character set is currently not configurable in the exec source - http://flume.apache.org/FlumeUserGuide.html#exec-source
> File - https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java
> Can somebody please expose the ability to specify character set in the exec source?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-1676) ExecSource should provide a configurable charset

Posted by "Nitin Verma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489312#comment-13489312 ] 

Nitin Verma commented on FLUME-1676:
------------------------------------

There are few questions around this request.

Before that I would like to explain a bit about two charsets under consideration.

Suppose we need to write a²=¼b in ISO-8859-1 (http://en.wikipedia.org/wiki/ISO/IEC_8859-1).
1. a,b,= fall in ASCII range, thus you can type
2. ² = B2, ¼ = BC in hex.

$ awk ' BEGIN { printf "a%s=%sb\n", "\xB2", "\xBC" } ' 
a�=�b

Note: If this shows up as a²=¼b, then you are on ISO-8859-1.

Now let us encode the same in UTF-8 (http://en.wikipedia.org/wiki/UTF-8)

   Char. number range  |        UTF-8 octet sequence
      (hexadecimal)    |              (binary)
   --------------------+---------------------------------------------
   0000 0000-0000 007F | 0xxxxxxx
   0000 0080-0000 07FF | 110xxxxx 10xxxxxx
   0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
   0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
   and so on so forth

The hex values for the chars is same in UTF-8 but it has to be encoded it is not a single byte charset (² = B2, ¼ = BC )

As B2 & BC > 7F and < 0800, it would be encoded in two bytes (110xxxxx 10xxxxxx)
B2 => 1011 0010 => 1100 0010 1011 0010 => C2 B2
B2 => 1011 1100 => 1100 0010 1011 1100 => C2 BC

$ awk ' BEGIN { printf "a%s=%sb\n", "\xC2\xB2", "\xC2\xBC" } ' 
a²=¼b

Note: If this shows up as a²=¼b, then you are on ISO-8859-1.

iconv tries to makes sure it translates bytes in such a way that from-charset is visible on to-charset terminal.

Thus it would add C2, if I do the following.

$ awk ' BEGIN { printf "a%s=%sb\n", "\xB2", "\xBC" } ' | iconv -f "ISO-8859-1" -t "UTF-8"
a²=¼b

Warning:
There are many charsets around and not all charsets support all the characters. Thereby Byte translation is a lossy business. Example below:-
$ awk ' BEGIN { print "\xE0\xA5\x90" } ' | iconv -f "UTF-8" -t "ISO-8859-1"
iconv: illegal input sequence at position 0

Considering all above, I feel

Flume should concentrate on transferring byte to byte from one system to another, not translating. If the charset of two systems is different, then
source system: cat $file
sink system: cat $file | iconv -f source-charset -t sink-charset
should show the same visible output, till sink-charset defines all the characters defined in source-charset.

** Only guarantee flume should give is bytes transferred on sink are the same as the bytes given via the source **




                
> ExecSource should provide a configurable charset
> ------------------------------------------------
>
>                 Key: FLUME-1676
>                 URL: https://issues.apache.org/jira/browse/FLUME-1676
>             Project: Flume
>          Issue Type: Bug
>         Environment: :~/apache-flume-1.4.0-SNAPSHOT/conf# ../bin/flume-ng version
> Flume 1.4.0-SNAPSHOT
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: 831a86fc5501a8624b184ea65e53749df31692b8
> Compiled by jenkins on Tue Oct 30 03:18:08 UTC 2012
> From source with checksum 98685e32b9e500a2305f538b4468faaa
>            Reporter: Suresh Saggar
>
> The character set is currently not configurable in the exec source - http://flume.apache.org/FlumeUserGuide.html#exec-source
> File - https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java
> Can somebody please expose the ability to specify character set in the exec source?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-1676) ExecSource should provide a configurable charset

Posted by "Roshan Naik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509367#comment-13509367 ] 

Roshan Naik commented on FLUME-1676:
------------------------------------

would be nice to have it on review board.
                
> ExecSource should provide a configurable charset
> ------------------------------------------------
>
>                 Key: FLUME-1676
>                 URL: https://issues.apache.org/jira/browse/FLUME-1676
>             Project: Flume
>          Issue Type: Bug
>    Affects Versions: notrack
>         Environment: :~/apache-flume-1.4.0-SNAPSHOT/conf# ../bin/flume-ng version
> Flume 1.4.0-SNAPSHOT
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: 831a86fc5501a8624b184ea65e53749df31692b8
> Compiled by jenkins on Tue Oct 30 03:18:08 UTC 2012
> From source with checksum 98685e32b9e500a2305f538b4468faaa
>            Reporter: Suresh Saggar
>              Labels: patch
>         Attachments: flume-1676.patch
>
>
> The character set is currently not configurable in the exec source - http://flume.apache.org/FlumeUserGuide.html#exec-source
> File - https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java
> Can somebody please expose the ability to specify character set in the exec source?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-1676) ExecSource should provide a configurable charset

Posted by "Mike Percy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488942#comment-13488942 ] 

Mike Percy commented on FLUME-1676:
-----------------------------------

Talked to Suresh about this on IRC. Example would be using exec source with a tail -F command on a file that is ISO-8859 encoded.
                
> ExecSource should provide a configurable charset
> ------------------------------------------------
>
>                 Key: FLUME-1676
>                 URL: https://issues.apache.org/jira/browse/FLUME-1676
>             Project: Flume
>          Issue Type: Bug
>         Environment: :~/apache-flume-1.4.0-SNAPSHOT/conf# ../bin/flume-ng version
> Flume 1.4.0-SNAPSHOT
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: 831a86fc5501a8624b184ea65e53749df31692b8
> Compiled by jenkins on Tue Oct 30 03:18:08 UTC 2012
> From source with checksum 98685e32b9e500a2305f538b4468faaa
>            Reporter: Suresh Saggar
>
> The character set is currently not configurable in the exec source - http://flume.apache.org/FlumeUserGuide.html#exec-source
> File - https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java
> Can somebody please expose the ability to specify character set in the exec source?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (FLUME-1676) ExecSource should provide a configurable charset

Posted by "Nitin Verma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nitin Verma updated FLUME-1676:
-------------------------------

    Attachment: flume-1676.patch

attaching the patch please review.
                
> ExecSource should provide a configurable charset
> ------------------------------------------------
>
>                 Key: FLUME-1676
>                 URL: https://issues.apache.org/jira/browse/FLUME-1676
>             Project: Flume
>          Issue Type: Bug
>    Affects Versions: notrack
>         Environment: :~/apache-flume-1.4.0-SNAPSHOT/conf# ../bin/flume-ng version
> Flume 1.4.0-SNAPSHOT
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: 831a86fc5501a8624b184ea65e53749df31692b8
> Compiled by jenkins on Tue Oct 30 03:18:08 UTC 2012
> From source with checksum 98685e32b9e500a2305f538b4468faaa
>            Reporter: Suresh Saggar
>              Labels: patch
>         Attachments: flume-1676.patch
>
>
> The character set is currently not configurable in the exec source - http://flume.apache.org/FlumeUserGuide.html#exec-source
> File - https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java
> Can somebody please expose the ability to specify character set in the exec source?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-1676) ExecSource should provide a configurable charset

Posted by "Nitin Verma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489504#comment-13489504 ] 

Nitin Verma commented on FLUME-1676:
------------------------------------

Hi Mike,

I did some testing on constructing java strings using iso-8859-1 bytes. As java string translates from given bytes to UTF-16, if charset is not correct then it is lossy. (default is UTF-8)

For flume we should ingest and egest bytes from strings using the charset so that channel get the same bytes as user source had, likewise the sink.

string = new String(bytes, charset);
string.getBytes(charset);

TODO: I would do similar tests on streams.

Java Test Code
{code:java}
package edu.nitin.testcodes;

import java.nio.charset.Charset;
import org.testng.annotations.Test;

public class CharsetTest {

    @Test
    public void testCharset() {
        final byte[] bytes = new byte[]{(byte) 0x40, (byte) 0xC2, (byte) 0xE6,(byte) 0x40};
        final Charset charset = Charset.forName("ISO-8859-1");
        System.out.println("Input bytes");
        print(bytes);

        System.out.println("ingest using charset");
        {
            final String string = new String(bytes, charset);
            System.out.println(string);
            print(string.getBytes());
            print(string.getBytes(charset));
        }

        System.out.println("ingest without using charset");
        {
            final String string = new String(bytes);
            System.out.println(string);
            print(string.getBytes());
            print(string.getBytes(charset));
        }

    }

    private void print(final byte bytes[]) {
        for (byte b : bytes) {
            System.out.printf("  %02X", b);
        }
        System.out.println();
    }
}

{code}

Output
{code}
Input bytes
  40  C2  E6  40
ingest using charset
@Âæ@
  40  C3  82  C3  A6  40
  40  C2  E6  40
ingest without using charset
@��
  40  EF  BF  BD  EF  BF  BD
  40  3F  3F
{code}

                
> ExecSource should provide a configurable charset
> ------------------------------------------------
>
>                 Key: FLUME-1676
>                 URL: https://issues.apache.org/jira/browse/FLUME-1676
>             Project: Flume
>          Issue Type: Bug
>         Environment: :~/apache-flume-1.4.0-SNAPSHOT/conf# ../bin/flume-ng version
> Flume 1.4.0-SNAPSHOT
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: 831a86fc5501a8624b184ea65e53749df31692b8
> Compiled by jenkins on Tue Oct 30 03:18:08 UTC 2012
> From source with checksum 98685e32b9e500a2305f538b4468faaa
>            Reporter: Suresh Saggar
>
> The character set is currently not configurable in the exec source - http://flume.apache.org/FlumeUserGuide.html#exec-source
> File - https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java
> Can somebody please expose the ability to specify character set in the exec source?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-1676) ExecSource should provide a configurable charset

Posted by "Nitin Verma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489544#comment-13489544 ] 

Nitin Verma commented on FLUME-1676:
------------------------------------

Hi Mike,

InputStreamReader needs to know the charset else readLine just messes it up.

bufferedReader = new BufferedReader(new InputStreamReader(byteArrayInputStream, charset));
bufferedReader.readLine().getBytes(charset);


{code:java}
package edu.nitin.testcodes;

import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.nio.charset.Charset;
import org.testng.annotations.Test;

public class CharsetStreamTest {

    @Test
    public void testCharset() throws IOException {
        final byte[] bytes = new byte[]{
            (byte) 0x40, (byte) 0xC2, (byte) 0xE6, (byte) 0x40, (byte) '\n',
            (byte) 0x41, (byte) 0xC2, (byte) 0xE6, (byte) 0x40, (byte) '\n',
            (byte) 0x42, (byte) 0xC2, (byte) 0xE6, (byte) 0x40, (byte) '\n',
            (byte) 0x43, (byte) 0xC2, (byte) 0xE6, (byte) 0x40, (byte) '\n',
            (byte) 0x44, (byte) 0xC2, (byte) 0xE6, (byte) 0x40
        };

        final Charset charset = Charset.forName("ISO-8859-1");
        System.out.println("Input bytes");
        print(bytes);

        System.out.println("ingest using charset");
        {
            final ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes);

            final BufferedReader bufferedReader = new BufferedReader(
                    new InputStreamReader(byteArrayInputStream, charset));
            String line;
            while ((line = bufferedReader.readLine()) != null) {
                print(line.getBytes(charset));
            }
        }

        System.out.println("ingest without using charset");
        {
            final ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes);

            final BufferedReader bufferedReader = new BufferedReader(
                    new InputStreamReader(byteArrayInputStream));
            String line;
            while ((line = bufferedReader.readLine()) != null) {
                print(line.getBytes(charset));
            }
        }

    }

    private void print(final byte bytes[]) {
        for (byte b : bytes) {
            System.out.printf("  %02X", b);
        }
        System.out.println();
    }
}
{code}

{code}
Input bytes
  40  C2  E6  40  0A  41  C2  E6  40  0A  42  C2  E6  40  0A  43  C2  E6  40  0A  44  C2  E6  40
ingest using charset
  40  C2  E6  40
  41  C2  E6  40
  42  C2  E6  40
  43  C2  E6  40
  44  C2  E6  40
ingest without using charset
  40  3F  3F  40
  41  3F  3F  40
  42  3F  3F  40
  43  3F  3F  40
  44  3F  3F
{code}
                
> ExecSource should provide a configurable charset
> ------------------------------------------------
>
>                 Key: FLUME-1676
>                 URL: https://issues.apache.org/jira/browse/FLUME-1676
>             Project: Flume
>          Issue Type: Bug
>         Environment: :~/apache-flume-1.4.0-SNAPSHOT/conf# ../bin/flume-ng version
> Flume 1.4.0-SNAPSHOT
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: 831a86fc5501a8624b184ea65e53749df31692b8
> Compiled by jenkins on Tue Oct 30 03:18:08 UTC 2012
> From source with checksum 98685e32b9e500a2305f538b4468faaa
>            Reporter: Suresh Saggar
>
> The character set is currently not configurable in the exec source - http://flume.apache.org/FlumeUserGuide.html#exec-source
> File - https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java
> Can somebody please expose the ability to specify character set in the exec source?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-1676) ExecSource should provide a configurable charset

Posted by "Mike Percy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489324#comment-13489324 ] 

Mike Percy commented on FLUME-1676:
-----------------------------------

Nitin: That is the guarantee Flume provides. I believe the request is the following:
1. Provide a way to specify the charset that is provided on the terminal to Flume, so it knows how to decode it into a String.
2. Provide a way to specify the charset we will store in the Flume Event object itself, when we encode the String into binary.

Without specifying these things, the user has no control over how Flume interprets his data.
                
> ExecSource should provide a configurable charset
> ------------------------------------------------
>
>                 Key: FLUME-1676
>                 URL: https://issues.apache.org/jira/browse/FLUME-1676
>             Project: Flume
>          Issue Type: Bug
>         Environment: :~/apache-flume-1.4.0-SNAPSHOT/conf# ../bin/flume-ng version
> Flume 1.4.0-SNAPSHOT
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: 831a86fc5501a8624b184ea65e53749df31692b8
> Compiled by jenkins on Tue Oct 30 03:18:08 UTC 2012
> From source with checksum 98685e32b9e500a2305f538b4468faaa
>            Reporter: Suresh Saggar
>
> The character set is currently not configurable in the exec source - http://flume.apache.org/FlumeUserGuide.html#exec-source
> File - https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java
> Can somebody please expose the ability to specify character set in the exec source?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (FLUME-1676) ExecSource should provide a configurable charset

Posted by "Mike Percy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489324#comment-13489324 ] 

Mike Percy edited comment on FLUME-1676 at 11/2/12 9:26 AM:
------------------------------------------------------------

Nitin: That is the guarantee Flume provides as a framework. I believe the request is the following:

1. Provide a way to specify the charset that is provided on the terminal to Flume, so that the Exec Source knows how to decode it into a String.
2. Provide a way to specify the charset we will store in the Flume Event object itself, when the Exec Source encodes the String into binary form using EventBuilder.

Without the capability to specify these encodings, a user doesn't have enough control over how the Exec Source interprets his text input data.

(Edit: clarifications)
                
      was (Author: mpercy):
    Nitin: That is the guarantee Flume provides. I believe the request is the following:
1. Provide a way to specify the charset that is provided on the terminal to Flume, so it knows how to decode it into a String.
2. Provide a way to specify the charset we will store in the Flume Event object itself, when we encode the String into binary.

Without specifying these things, the user has no control over how Flume interprets his data.
                  
> ExecSource should provide a configurable charset
> ------------------------------------------------
>
>                 Key: FLUME-1676
>                 URL: https://issues.apache.org/jira/browse/FLUME-1676
>             Project: Flume
>          Issue Type: Bug
>         Environment: :~/apache-flume-1.4.0-SNAPSHOT/conf# ../bin/flume-ng version
> Flume 1.4.0-SNAPSHOT
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: 831a86fc5501a8624b184ea65e53749df31692b8
> Compiled by jenkins on Tue Oct 30 03:18:08 UTC 2012
> From source with checksum 98685e32b9e500a2305f538b4468faaa
>            Reporter: Suresh Saggar
>
> The character set is currently not configurable in the exec source - http://flume.apache.org/FlumeUserGuide.html#exec-source
> File - https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java
> Can somebody please expose the ability to specify character set in the exec source?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-1676) ExecSource should provide a configurable charset

Posted by "Nitin Verma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489555#comment-13489555 ] 

Nitin Verma commented on FLUME-1676:
------------------------------------

So there are two ways to deal these bytes
1. Do not use String/Reader, that is deal with InputStream/byte[].
2. Make String/Reader charset aware
                
> ExecSource should provide a configurable charset
> ------------------------------------------------
>
>                 Key: FLUME-1676
>                 URL: https://issues.apache.org/jira/browse/FLUME-1676
>             Project: Flume
>          Issue Type: Bug
>         Environment: :~/apache-flume-1.4.0-SNAPSHOT/conf# ../bin/flume-ng version
> Flume 1.4.0-SNAPSHOT
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: 831a86fc5501a8624b184ea65e53749df31692b8
> Compiled by jenkins on Tue Oct 30 03:18:08 UTC 2012
> From source with checksum 98685e32b9e500a2305f538b4468faaa
>            Reporter: Suresh Saggar
>
> The character set is currently not configurable in the exec source - http://flume.apache.org/FlumeUserGuide.html#exec-source
> File - https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java
> Can somebody please expose the ability to specify character set in the exec source?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira