You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Jarek Jarcec Cecho (Created) (JIRA)" <ji...@apache.org> on 2011/11/25 08:12:40 UTC

[jira] [Created] (FLUME-860) Combination of ExecSource and HDFS DataStream is removing end line characters

Combination of ExecSource and HDFS DataStream is removing end line characters
-----------------------------------------------------------------------------

                 Key: FLUME-860
                 URL: https://issues.apache.org/jira/browse/FLUME-860
             Project: Flume
          Issue Type: Bug
    Affects Versions: NG alpha 2
            Reporter: Jarek Jarcec Cecho


I've noticed that combination of ExecSource and HDFS Sink configured to use DataStream is removing end line characters and thus is creating one line output file.

I've used two centos boxes where first was acting as an agent, reading local log file using ExecSource. Second machine was acting as a collector, waiting for input events and storing them on HDFS.  Both machines were connected using AVRO sink+source combination. You can find configuration files for both machines with their logs as well attached to this JIRA bug.

I've executed both flume-ng instances using following commands:
./bin/flume-ng node --conf conf/ --classpath flume-ng.jar --f conf/configuration.properties -n hddev01 > hddev01.log 2>&1
./bin/flume-ng node --conf conf/ --classpath flume-ng.jar --f conf/configuration.properties -n hddev02  > hddev02.log 2>&1

Input file was created using following small bash script (it was executed after flume-ng was successfully started):
for i in `seq -w 01 10`; do echo $i; echo Yoda-$i >> /var/log/jarcec; sleep 1s; done

Please note that I had to apply patch from FLUME-858 in order to get DataStream file type working.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (FLUME-860) Combination of ExecSource and HDFS DataStream is removing end line characters

Posted by "Jarek Jarcec Cecho (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jarek Jarcec Cecho updated FLUME-860:
-------------------------------------

    Attachment: hddev02.log
    
> Combination of ExecSource and HDFS DataStream is removing end line characters
> -----------------------------------------------------------------------------
>
>                 Key: FLUME-860
>                 URL: https://issues.apache.org/jira/browse/FLUME-860
>             Project: Flume
>          Issue Type: Bug
>    Affects Versions: NG alpha 2
>            Reporter: Jarek Jarcec Cecho
>         Attachments: hddev01.log, hddev01.properties, hddev02.log, hddev02.properties, input, output
>
>
> I've noticed that combination of ExecSource and HDFS Sink configured to use DataStream is removing end line characters and thus is creating one line output file.
> I've used two centos boxes where first was acting as an agent, reading local log file using ExecSource. Second machine was acting as a collector, waiting for input events and storing them on HDFS.  Both machines were connected using AVRO sink+source combination. You can find configuration files for both machines with their logs as well attached to this JIRA bug.
> I've executed both flume-ng instances using following commands:
> ./bin/flume-ng node --conf conf/ --classpath flume-ng.jar --f conf/configuration.properties -n hddev01 > hddev01.log 2>&1
> ./bin/flume-ng node --conf conf/ --classpath flume-ng.jar --f conf/configuration.properties -n hddev02  > hddev02.log 2>&1
> Input file was created using following small bash script (it was executed after flume-ng was successfully started):
> for i in `seq -w 01 10`; do echo $i; echo Yoda-$i >> /var/log/jarcec; sleep 1s; done
> Please note that I had to apply patch from FLUME-858 in order to get DataStream file type working.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (FLUME-860) Combination of ExecSource and HDFS DataStream is removing end line characters

Posted by "Jarek Jarcec Cecho (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jarek Jarcec Cecho updated FLUME-860:
-------------------------------------

    Attachment: hddev01.properties
    
> Combination of ExecSource and HDFS DataStream is removing end line characters
> -----------------------------------------------------------------------------
>
>                 Key: FLUME-860
>                 URL: https://issues.apache.org/jira/browse/FLUME-860
>             Project: Flume
>          Issue Type: Bug
>    Affects Versions: NG alpha 2
>            Reporter: Jarek Jarcec Cecho
>         Attachments: hddev01.log, hddev01.properties, hddev02.log, hddev02.properties, input, output
>
>
> I've noticed that combination of ExecSource and HDFS Sink configured to use DataStream is removing end line characters and thus is creating one line output file.
> I've used two centos boxes where first was acting as an agent, reading local log file using ExecSource. Second machine was acting as a collector, waiting for input events and storing them on HDFS.  Both machines were connected using AVRO sink+source combination. You can find configuration files for both machines with their logs as well attached to this JIRA bug.
> I've executed both flume-ng instances using following commands:
> ./bin/flume-ng node --conf conf/ --classpath flume-ng.jar --f conf/configuration.properties -n hddev01 > hddev01.log 2>&1
> ./bin/flume-ng node --conf conf/ --classpath flume-ng.jar --f conf/configuration.properties -n hddev02  > hddev02.log 2>&1
> Input file was created using following small bash script (it was executed after flume-ng was successfully started):
> for i in `seq -w 01 10`; do echo $i; echo Yoda-$i >> /var/log/jarcec; sleep 1s; done
> Please note that I had to apply patch from FLUME-858 in order to get DataStream file type working.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (FLUME-860) Combination of ExecSource and HDFS DataStream is removing end line characters

Posted by "Jarek Jarcec Cecho (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jarek Jarcec Cecho updated FLUME-860:
-------------------------------------

    Attachment: hddev02.properties
    
> Combination of ExecSource and HDFS DataStream is removing end line characters
> -----------------------------------------------------------------------------
>
>                 Key: FLUME-860
>                 URL: https://issues.apache.org/jira/browse/FLUME-860
>             Project: Flume
>          Issue Type: Bug
>    Affects Versions: NG alpha 2
>            Reporter: Jarek Jarcec Cecho
>         Attachments: hddev01.log, hddev01.properties, hddev02.log, hddev02.properties, input, output
>
>
> I've noticed that combination of ExecSource and HDFS Sink configured to use DataStream is removing end line characters and thus is creating one line output file.
> I've used two centos boxes where first was acting as an agent, reading local log file using ExecSource. Second machine was acting as a collector, waiting for input events and storing them on HDFS.  Both machines were connected using AVRO sink+source combination. You can find configuration files for both machines with their logs as well attached to this JIRA bug.
> I've executed both flume-ng instances using following commands:
> ./bin/flume-ng node --conf conf/ --classpath flume-ng.jar --f conf/configuration.properties -n hddev01 > hddev01.log 2>&1
> ./bin/flume-ng node --conf conf/ --classpath flume-ng.jar --f conf/configuration.properties -n hddev02  > hddev02.log 2>&1
> Input file was created using following small bash script (it was executed after flume-ng was successfully started):
> for i in `seq -w 01 10`; do echo $i; echo Yoda-$i >> /var/log/jarcec; sleep 1s; done
> Please note that I had to apply patch from FLUME-858 in order to get DataStream file type working.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (FLUME-860) Combination of ExecSource and HDFS DataStream is removing end line characters

Posted by "Jarek Jarcec Cecho (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jarek Jarcec Cecho updated FLUME-860:
-------------------------------------

    Attachment: input
    
> Combination of ExecSource and HDFS DataStream is removing end line characters
> -----------------------------------------------------------------------------
>
>                 Key: FLUME-860
>                 URL: https://issues.apache.org/jira/browse/FLUME-860
>             Project: Flume
>          Issue Type: Bug
>    Affects Versions: NG alpha 2
>            Reporter: Jarek Jarcec Cecho
>         Attachments: hddev01.log, hddev01.properties, hddev02.log, hddev02.properties, input, output
>
>
> I've noticed that combination of ExecSource and HDFS Sink configured to use DataStream is removing end line characters and thus is creating one line output file.
> I've used two centos boxes where first was acting as an agent, reading local log file using ExecSource. Second machine was acting as a collector, waiting for input events and storing them on HDFS.  Both machines were connected using AVRO sink+source combination. You can find configuration files for both machines with their logs as well attached to this JIRA bug.
> I've executed both flume-ng instances using following commands:
> ./bin/flume-ng node --conf conf/ --classpath flume-ng.jar --f conf/configuration.properties -n hddev01 > hddev01.log 2>&1
> ./bin/flume-ng node --conf conf/ --classpath flume-ng.jar --f conf/configuration.properties -n hddev02  > hddev02.log 2>&1
> Input file was created using following small bash script (it was executed after flume-ng was successfully started):
> for i in `seq -w 01 10`; do echo $i; echo Yoda-$i >> /var/log/jarcec; sleep 1s; done
> Please note that I had to apply patch from FLUME-858 in order to get DataStream file type working.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (FLUME-860) Combination of ExecSource and HDFS DataStream is removing end line characters

Posted by "Jarek Jarcec Cecho (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jarek Jarcec Cecho resolved FLUME-860.
--------------------------------------

    Resolution: Invalid
      Assignee: Jarek Jarcec Cecho

This is not a bug, there is need to set up property hdfs.writeFormat = Text so that correct FlumeFormatter that is appending new line character is used instead of the default one.

It would be nice thought if the HDFSSink would automatically use Text FlumeFormatter in case that output is DataStream used, but I'll create another JIRA for that.
                
> Combination of ExecSource and HDFS DataStream is removing end line characters
> -----------------------------------------------------------------------------
>
>                 Key: FLUME-860
>                 URL: https://issues.apache.org/jira/browse/FLUME-860
>             Project: Flume
>          Issue Type: Bug
>    Affects Versions: NG alpha 2
>            Reporter: Jarek Jarcec Cecho
>            Assignee: Jarek Jarcec Cecho
>         Attachments: hddev01.log, hddev01.properties, hddev02.log, hddev02.properties, input, output
>
>
> I've noticed that combination of ExecSource and HDFS Sink configured to use DataStream is removing end line characters and thus is creating one line output file.
> I've used two centos boxes where first was acting as an agent, reading local log file using ExecSource. Second machine was acting as a collector, waiting for input events and storing them on HDFS.  Both machines were connected using AVRO sink+source combination. You can find configuration files for both machines with their logs as well attached to this JIRA bug.
> I've executed both flume-ng instances using following commands:
> ./bin/flume-ng node --conf conf/ --classpath flume-ng.jar --f conf/configuration.properties -n hddev01 > hddev01.log 2>&1
> ./bin/flume-ng node --conf conf/ --classpath flume-ng.jar --f conf/configuration.properties -n hddev02  > hddev02.log 2>&1
> Input file was created using following small bash script (it was executed after flume-ng was successfully started):
> for i in `seq -w 01 10`; do echo $i; echo Yoda-$i >> /var/log/jarcec; sleep 1s; done
> Please note that I had to apply patch from FLUME-858 in order to get DataStream file type working.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (FLUME-860) Combination of ExecSource and HDFS DataStream is removing end line characters

Posted by "Jarek Jarcec Cecho (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jarek Jarcec Cecho updated FLUME-860:
-------------------------------------

    Attachment: output
    
> Combination of ExecSource and HDFS DataStream is removing end line characters
> -----------------------------------------------------------------------------
>
>                 Key: FLUME-860
>                 URL: https://issues.apache.org/jira/browse/FLUME-860
>             Project: Flume
>          Issue Type: Bug
>    Affects Versions: NG alpha 2
>            Reporter: Jarek Jarcec Cecho
>         Attachments: hddev01.log, hddev01.properties, hddev02.log, hddev02.properties, input, output
>
>
> I've noticed that combination of ExecSource and HDFS Sink configured to use DataStream is removing end line characters and thus is creating one line output file.
> I've used two centos boxes where first was acting as an agent, reading local log file using ExecSource. Second machine was acting as a collector, waiting for input events and storing them on HDFS.  Both machines were connected using AVRO sink+source combination. You can find configuration files for both machines with their logs as well attached to this JIRA bug.
> I've executed both flume-ng instances using following commands:
> ./bin/flume-ng node --conf conf/ --classpath flume-ng.jar --f conf/configuration.properties -n hddev01 > hddev01.log 2>&1
> ./bin/flume-ng node --conf conf/ --classpath flume-ng.jar --f conf/configuration.properties -n hddev02  > hddev02.log 2>&1
> Input file was created using following small bash script (it was executed after flume-ng was successfully started):
> for i in `seq -w 01 10`; do echo $i; echo Yoda-$i >> /var/log/jarcec; sleep 1s; done
> Please note that I had to apply patch from FLUME-858 in order to get DataStream file type working.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (FLUME-860) Combination of ExecSource and HDFS DataStream is removing end line characters

Posted by "Jarek Jarcec Cecho (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jarek Jarcec Cecho updated FLUME-860:
-------------------------------------

    Attachment: hddev01.log
    
> Combination of ExecSource and HDFS DataStream is removing end line characters
> -----------------------------------------------------------------------------
>
>                 Key: FLUME-860
>                 URL: https://issues.apache.org/jira/browse/FLUME-860
>             Project: Flume
>          Issue Type: Bug
>    Affects Versions: NG alpha 2
>            Reporter: Jarek Jarcec Cecho
>         Attachments: hddev01.log, hddev01.properties, hddev02.log, hddev02.properties, input, output
>
>
> I've noticed that combination of ExecSource and HDFS Sink configured to use DataStream is removing end line characters and thus is creating one line output file.
> I've used two centos boxes where first was acting as an agent, reading local log file using ExecSource. Second machine was acting as a collector, waiting for input events and storing them on HDFS.  Both machines were connected using AVRO sink+source combination. You can find configuration files for both machines with their logs as well attached to this JIRA bug.
> I've executed both flume-ng instances using following commands:
> ./bin/flume-ng node --conf conf/ --classpath flume-ng.jar --f conf/configuration.properties -n hddev01 > hddev01.log 2>&1
> ./bin/flume-ng node --conf conf/ --classpath flume-ng.jar --f conf/configuration.properties -n hddev02  > hddev02.log 2>&1
> Input file was created using following small bash script (it was executed after flume-ng was successfully started):
> for i in `seq -w 01 10`; do echo $i; echo Yoda-$i >> /var/log/jarcec; sleep 1s; done
> Please note that I had to apply patch from FLUME-858 in order to get DataStream file type working.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira