You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Attila Simon <sa...@cloudera.com> on 2017/05/16 16:44:22 UTC

Re: I want discuss something about flume

Hi Lisheng Xia,

I like this feature! I would like to add the dev list to this conversation
so others can express their opinion as well. After all what community says
is what really matters. We can discuss there your proposal in detail as
well as whether there is a library which can help you in the unit
conversion eg javax.measure
<http://jscience.org/api/javax/measure/unit/package-summary.html>.

In my opinion it is appropriate :

   - to have a configuration property name with "byte" and still allowing
   to specify the value with units, if it is clear from the documentation what
   would that mean (eg please see the GiB vs GB definitions here
   <https://en.wikipedia.org/wiki/Gigabyte>).
   - extending this feature with time values (Hour,Minute,Second,Milisec...)
   - having the conversation failures instantly and clearly visible to the
   user by throwing an exception. I think "< Agent >.configuration.error =
   defaultValue/stopRunning" would be even better but much harder to
   implement.


Cheers,
Attila


On Sun, May 14, 2017 at 1:02 PM, 小夏 <ia...@163.com> wrote:

> Dear Attila Simon:
>      I use the flume in my work,when I was in the configuration of the
> flume, found that some of the parameters of configuration is very
> troublesome, and readability is very poor like the Memory Channel's
> byteCapacity ,File Channel's transactionCapacity and maxFileSize etc.,
> basic it is about the capacity configuration.Generally when I was in the
> configuration that, I need a special calculation of byte after
> transformation, such as 2g into 2147483648 <(214)%20748-3648>, and must
> be withing annotated, otherwise the next time I read, don't know this is 2g
> intuitive
>     So I wrote a simple method used for converting readable capacity
> allocation into corresponding byte code is as follows.
> ------------------------------
> public class Utils {
> private static final String REGULAR="((?<gb>\\d+(\\.\\d+)?
> )(g|G|gb|GB))?((?<mb>\\d+(\\.\\d+)?)(m|M|mb|MB))?((?<kb>\\d+
> (\\.\\d+)?)(k|K|kb|KB))?((?<b>\\d+)(b|B|byte|BYTE)?)?";
> private static final int rate=1024;
> public static Long string2Bytes(String in){
> return string2Bytes(in,null);
> }
> public static Long string2Bytes(String in,Long defaultValue){
> if(in==null || in.trim().length()==0){
> return defaultValue;
> }
> in=in.trim();
> Pattern pattern = Pattern.compile(REGULAR);
> Matcher matcher = pattern.matcher(in);
> if(matcher.matches()){
> long bytes=0;
> String gb=matcher.group("gb");
> String mb=matcher.group("mb");
> String kb=matcher.group("kb");
> String b=matcher.group("b");
> if(gb!=null){
> bytes+=Math.round(Double.parseDouble(gb)*Math.pow(rate, 3));
> }
> if(mb!=null){
> bytes+=Math.round(Double.parseDouble(mb)*Math.pow(rate, 2));
> }
> if(kb!=null){
> bytes+=Math.round(Double.parseDouble(kb)*Math.pow(rate, 1));
> }
> if(b!=null){
> bytes+=Integer.parseInt(b);
> }
> return bytes;
> }else{
> throw new IllegalArgumentException("the param "+in+" is not a right");
> }
> }
> }
> Below is the test class
> @RunWith(Parameterized.class)
> public class UtilsTest {
> private String param;
> private Long result;
> public UtilsTest(String param,Long result){
>         this.param=param;
>         this.result=result;
>     }
>     @Parameters
>     public static Collection<Object[]> data() {
>         return Arrays.asList(new Object[][]{
>                 {"", null},
>                 {"  ", null},
>                 {"2g", 1L*2*1024*1024*1024},
>                 {"2G", 1L*2*1024*1024*1024},
>                 {"2gb", 1L*2*1024*1024*1024},
>                 {"2GB", 1L*2*1024*1024*1024},
>                 {"2000m", 1L*2000*1024*1024},
>                 {"2000mb", 1L*2000*1024*1024},
>                 {"2000M", 1L*2000*1024*1024},
>                 {"2000MB", 1L*2000*1024*1024},
>                 {"1000k", 1L*1000*1024},
>                 {"1000kb", 1L*1000*1024},
>                 {"1000K", 1L*1000*1024},
>                 {"1000KB", 1L*1000*1024},
>                 {"1000", 1L*1000},
>                 {"1.5GB", 1L*Math.round(1.5*1024*1024*1024)},
>                 {"1.38g", 1L*Math.round(1.38*1024*1024*1024)},
>                 {"1g500MB", 1L*1024*1024*1024+500*1024*1024},
>                 {"20MB512", 1L*20*1024*1024+512},
>                 {"0.5g", 1L*Math.round(0.5*1024*1024*1024)},
>                 {"0.5g0.5m", 1L*Math.round(0.5*1024*1024*10
> 24+0.5*1024*1024)},
>         });
>     }
> @Test
> public void testString2Bytes() {
> assertEquals(result,Utils.string2Bytes(param));
> }
> }
>
> public class UtilsTest2 {
> @Test(expected =IllegalArgumentException.class)
> public void testString2Bytes1() {
> String in="23t";
> Utils.string2Bytes(in);
> }
> @Test(expected =IllegalArgumentException.class)
> public void testString2Bytes2() {
> String in="2g50m1.4";
> Utils.string2Bytes(in);
> }
> @Test(expected =IllegalArgumentException.class)
> public void testString2Bytes3() {
> String in="4.2g";
> Utils.string2Bytes(in);
> }
> }
> ------------------------------
>     I'm going to put all the reading capacity place to use this method to
> read, and compatible with the previous usage, namely not with units of
> numerical defaults to byte, why I don't fork and pull request the code, the
> reason is that some of the parameter name with byte or bytes, if its value
> is 2GB or 500MB, it is appropriate to do so, or making people confuse, so
> I ask for your opinion in advance.
>     Parameters in the configuration of time whether to need to add the
> unit, I think can, do you think if capacity added to the unit, whether time
> synchronization also improved.
>     In addition to this I also want to talk about another point, that is,
> when the user parameter configuration errors are handled, in the flume, the
> means of processing is to configure the error using the default values, and
> print the warn message, even in some places will not print the warn,the
> following codes are in the MemoryChannel.class
> ------------------------------
>     try {
>       byteCapacityBufferPercentage = context.getInteger("byteCapaci
> tyBufferPercentage",
>
> defaultByteCapacityBufferPercentage);
>     } catch (NumberFormatException e) {
>       byteCapacityBufferPercentage = defaultByteCapacityBufferPercentage;
>     }
>     try {
>       transCapacity = context.getInteger("transactionCapacity",
> defaultTransCapacity);
>     } catch (NumberFormatException e) {
>       transCapacity = defaultTransCapacity;
>       LOGGER.warn("Invalid transation capacity specified, initializing
> channel"
>           + " to default capacity of {}", defaultTransCapacity);
>     }
> ------------------------------
>      I don't think this is right, because the common user won't be too
> care about a warn information, he would think that the program run
> successfully according to configuration parameters, the results do use the
> default values.
>      I think, if the user doesn't set a property then we should use
> default values, if the user is configured with a property, he certainly
> expect to this attribute to run the program, if the configuration errors
> should stop running and allow the user to modify., of course, this approach
> may be too radical, may give the option to the user, may be a new property  <
> Agent >.configuration.error = defaultValue/stopRunning, when configured
> to defaultValue shall, in accordance with the previous approach,
> configuration stopRunning will stop running the program.
>
>     Thank you very much for reading such a long email,and most of email
> are machine translation, looking forward to your reply, if possible I hope
> to become a member of the flume contributors.
>
>
>
>
>                            from Lisheng Xia
>
>
>
>
>