You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by NewTo Flume <ne...@gmail.com> on 2011/12/18 03:47:40 UTC

Flume collector example from Cloudera's UserGuide does not work as expected

The bit in the UserGuide that shows you how to setup a collector and write
to it [
http://archive.cloudera.com/cdh/3/flume/UserGuide/index.html#_tiering_flume_nodes_agents_and_collectors]
has this configuration:

    host : console | agentSink("localhost",35853) ;
    collector : collectorSource(35853) | console ;

I changed this to:

    dataSource : console | agentSink("localhost") ;
    dataCollector : collectorSource() | console ;

I spawned the nodes as:

    flume node_nowatch -n dataSource
    flume node_nowatch -n dataCollector

I have tried this on two systems:

1. Cloudera's own demo VM running inside VirtualBox with 2GB RAM.
It comes with Flume 0.9.4-cdh3u2

2. Ubuntu LTS (Lucid) with the debian package and openJDK (minus any hadoop
packages installed) as a VM running inside VirtualBox with 2GB RAM
Followed the steps here [
https://ccp.cloudera.com/display/CDHDOC/Flume+Installation#FlumeInstallation-InstallingtheFlumeRPMorDebianPackages
]

Here is what I did:

`flume dump 'collectorSource()'` leads to

    $ sudo netstat -anp | grep 35853
    tcp6       0      0 :::35853                :::*
 LISTEN      3520/java
    $ ps aux | grep java | grep 3520
    1000      3520  0.8  2.3 1050508 44676 pts/0   Sl+  15:38   0:02 java
-Dflume.log.dir=/usr/lib/flume/logs -Dflume.log.file=flume.log
-Dflume.root.logger=INFO,console -Dzookeeper.root.logger=ERROR,console
-Dwatchdog.root.logger=INFO,console
-Djava.library.path=/usr/lib/flume/lib::/usr/lib/hadoop/lib/native/Linux-amd64-64
com.cloudera.flume.agent.FlumeNode -1 -s -n dump -c dump: collectorSource()
| console;

My assumption is that:

    flume dump 'collectorSource()'

is same as running the config:

    dump : collectorSource() | console ;

and starting the node with

    flume node -1 -n dump -c "dump: collectorSource() | console;" -s

`dataSource : console | agentSink("localhost")` leads to

    $ sudo netstat -anp | grep 35853
    tcp6       0      0 :::35853                :::*
 LISTEN      3520/java
    tcp6       0      0 127.0.0.1:44878         127.0.0.1:35853
ESTABLISHED 3593/java
    tcp6       0      0 127.0.0.1:35853         127.0.0.1:44878
ESTABLISHED 3520/java

    $ ps aux | grep java | grep 3593
    1000      3593  1.2  3.0 1130956 57644 pts/1   Sl+  15:41   0:07 java
-Dflume.log.dir=/usr/lib/flume/logs -Dflume.log.file=flume.log
-Dflume.root.logger=INFO,console -Dzookeeper.root.logger=ERROR,console
-Dwatchdog.root.logger=INFO,console
-Djava.library.path=/usr/lib/flume/lib::/usr/lib/hadoop/lib/native/Linux-amd64-64
com.cloudera.flume.agent.FlumeNode -n dataSource

The observed behaviour **is exactly the same in both** the VirtualBox VMs:

Un-ending flow of this at **dataSource**

    2011-12-15 15:27:58,253 [Roll-TriggerThread-1] INFO
    durability.NaiveFileWALManager: File lives in

/tmp/flume-cloudera/agent/dataSource/writing/20111215-152748172-0500.1116926245855.00000034
    2011-12-15 15:27:58,253 [Roll-TriggerThread-1] INFO
    hdfs.SeqfileEventSink: constructed new seqfile event sink:

file=/tmp/flume-cloudera/agent/dataSource/writing/20111215-152758253-0500.1127006668855.00000034
    2011-12-15 15:27:58,254 [naive file wal consumer-35] INFO
    durability.NaiveFileWALManager: opening log file
    20111215-152748172-0500.1116926245855.00000034
    2011-12-15 15:27:58,254 [Roll-TriggerThread-1] INFO
    endtoend.AckListener$Empty: Empty Ack Listener began
    20111215-152758253-0500.1127006668855.00000034
    2011-12-15 15:27:58,256 [naive file wal consumer-35] INFO
    agent.WALAckManager: Ack for
    20111215-152748172-0500.1116926245855.00000034 is queued to be checked
    2011-12-15 15:27:58,257 [naive file wal consumer-35] INFO
    durability.WALSource: end of file NaiveFileWALManager
    (dir=/tmp/flume-cloudera/agent/dataSource )
    2011-12-15 15:28:07,874 [Heartbeat] INFO agent.WALAckManager:
    Retransmitting 20111215-152657736-0500.1066489868855.00000034 after
    being stale for 60048ms
    2011-12-15 15:28:07,875 [naive file wal consumer-35] INFO
    durability.NaiveFileWALManager: opening log file
    20111215-152657736-0500.1066489868855.00000034
    2011-12-15 15:28:07,877 [naive file wal consumer-35] INFO
    agent.WALAckManager: Ack for
    20111215-152657736-0500.1066489868855.00000034 is queued to be checked
    2011-12-15 15:28:07,877 [naive file wal consumer-35] INFO
    durability.WALSource: end of file NaiveFileWALManager
    (dir=/tmp/flume-cloudera/agent/dataSource )
    2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO
    hdfs.SeqfileEventSink: closed

/tmp/flume-cloudera/agent/dataSource/writing/20111215-152758253-0500.1127006668855.00000034
    2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO
    endtoend.AckListener$Empty: Empty Ack Listener ended
    20111215-152758253-0500.1127006668855.00000034

    2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO
    durability.NaiveFileWALManager: File lives in

/tmp/flume-cloudera/agent/dataSource/writing/20111215-152758253-0500.1127006668855.00000034
    2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO
    hdfs.SeqfileEventSink: constructed new seqfile event sink:

file=/tmp/flume-cloudera/agent/dataSource/writing/20111215-152808335-0500.1137089135855.00000034
    2011-12-15 15:28:08,336 [naive file wal consumer-35] INFO
    durability.NaiveFileWALManager: opening log file
    20111215-152758253-0500.1127006668855.00000034
    2011-12-15 15:28:08,337 [Roll-TriggerThread-1] INFO
    endtoend.AckListener$Empty: Empty Ack Listener began
    20111215-152808335-0500.1137089135855.00000034
    2011-12-15 15:28:08,339 [naive file wal consumer-35] INFO
    agent.WALAckManager: Ack for
    20111215-152758253-0500.1127006668855.00000034 is queued to be checked
    2011-12-15 15:28:08,339 [naive file wal consumer-35] INFO
    durability.WALSource: end of file NaiveFileWALManager
    (dir=/tmp/flume-cloudera/agent/dataSource )
    2011-12-15 15:28:18,421 [Roll-TriggerThread-1] INFO
    hdfs.SeqfileEventSink: closed

/tmp/flume-cloudera/agent/dataSource/writing/20111215-152808335-0500.1137089135855.00000034
    2011-12-15 15:28:18,421 [Roll-TriggerThread-1] INFO
    endtoend.AckListener$Empty: Empty Ack Listener ended
    20111215-152808335-0500.1137089135855.00000034

    ..

    2011-12-15 15:35:24,763 [Heartbeat] INFO agent.WALAckManager:
    Retransmitting 20111215-152707823-0500.1076576334855.00000034 after
    being stale for 60277ms
    2011-12-15 15:35:24,763 [Heartbeat] INFO
    durability.NaiveFileWALManager: Attempt to retry chunk
    '20111215-152707823-0500.1076576334855.00000034'  in LOGGED state.
    There is no need for state transition.

Un-ending flow of this at **dataCollector**:

    localhost [INFO Thu Dec 15 15:31:09 EST 2011] {
    AckChecksum : (long)1323981059821  (string) ' 4Ck��'
(double)6.54133557402E-312 } { AckTag :
20111215-153059819-0500.1308572847855.00000034 } { AckType : end }


How do I get the console <-> console communication via collectors working
again correctly?