You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Victor Sanchez <vh...@gmail.com> on 2013/06/17 16:02:36 UTC

Problems with flume + Hbase sink

Hi,

I got some problems using org.apache.flume.sink.hbase.HBaseSink I also
tried org.apache.flume.sink.hbase.AsyncHBaseSink but no success.

I'm running on:

Flume NG 1.3.0-cdh4.3.0 CDH4
Hadoop 2.0.0-cdh4.3.0 CDH4
HBase 0.94.6-cdh4.3.0 CDH4
Zookeeper 3.4.5-cdh4.3.0 CDH4
Cloudera Manager Management Daemons 4.5.0 Not applicable

1. I ran into https://issues.cloudera.org/browse/DISTRO-438. Then I used
the work around "Remove or rename zoo.cfg file from /etc/zookeeper/conf."

2. Hbase seems properly configure. I can manually do a put in the table,
but no success while using the sink.

hbase(main):004:0> put 'test_mEEsures','test_row1','M:cM1','test_value1'
0 row(s) in 0.0770 seconds

hbase(main):007:0> scan 'test_mEEsures'
ROW                             COLUMN+CELL
 test_row1                      column=M:cM1, timestamp=1370965687758,
value=test_value1

hbase(main):003:0> describe 'test_mEEsures'
DESCRIPTION
  ENABLED
 {NAME => 'test_mEEsures', FAMILIES => [{NAME => 'M', DATA_BLOCK_ENCODING
=>  true
 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3',
CO
 MPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647',
KEEP_DELETED_
 CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false',
ENCODE_ON_DISK
  => 'true', BLOCKCACHE => 'true'}]}
1 row(s) in 0.1020 seconds

3. I also had working examples of flume writing to hdfs, so I know that
source and channels are properly configured.


4. The problem seems to be in the HBase sink. When I send a message using
NC just to test I see in flume logs that "something is been created" but
when I check on Hbase there is no record of it. I checked logs on flume and
on hbase but I don't see where I'm missing something.

Any tip will be more than welcome!


Here is the sink part of the flume config I'm using:

mEEsuresAgent.sinks.SinkToHBase.channel       = MemoryChannel
mEEsuresAgent.sinks.SinkToHBase.type          =
org.apache.flume.sink.hbase.HBaseSink
mEEsuresAgent.sinks.SinkToHBase.table         = test_mEEsures
mEEsuresAgent.sinks.SinkToHBase.columnFamily  = M
mEEsuresAgent.sinks.SinkToHBase.column        = cM1
mEEsuresAgent.sinks.SinkToHBase.serializer    =
org.apache.flume.sink.hbase.SimpleHbaseEventSerializer
mEEsuresAgent.sinks.SinkToHBase.batchSize     = 1

This is part of the logs from flume (please check the last line)
6:41:15.533 PM INFO org.apache.flume.node.AbstractConfigurationProvider
Created channel MemoryChannel
6:41:15.534 PM INFO org.apache.flume.source.DefaultSourceFactory
Creating instance of source mEEsuresSRC1, type syslogudp
6:41:15.570 PM INFO org.apache.flume.sink.DefaultSinkFactory
Creating instance of sink: SinkToHBase, type:
org.apache.flume.sink.hbase.HBaseSink
6:41:15.972 PM INFO org.apache.flume.sink.hbase.HBaseSink
The write to WAL option is set to: true
6:41:15.974 PM INFO org.apache.flume.node.AbstractConfigurationProvider
Channel MemoryChannel connected to [mEEsuresSRC1, SinkToHBase]
6:41:15.981 PM INFO org.apache.flume.node.Application
Starting new configuration:{
sourceRunners:{mEEsuresSRC1=EventDrivenSourceRunner: {
source:org.apache.flume.source.SyslogUDPSource{name:mEEsuresSRC1,state:IDLE}
}} sinkRunners:{SinkToHBase=SinkRunner: {
policy:org.apache.flume.sink.DefaultSinkProcessor@2af081 counterGroup:{
name:null counters:{} } }}
channels:{MemoryChannel=org.apache.flume.channel.MemoryChannel{name:
MemoryChannel}} }
6:41:15.985 PM INFO org.apache.flume.node.Application
Starting Channel MemoryChannel
6:41:15.986 PM INFO org.apache.flume.node.Application
Waiting for channel: MemoryChannel to start. Sleeping for 500 ms
6:41:16.031 PM INFO org.apache.flume.instrumentation.MonitoredCounterGroup
Monitoried counter group for type: CHANNEL, name: MemoryChannel, registered
successfully.
6:41:16.031 PM INFO org.apache.flume.instrumentation.MonitoredCounterGroup
Component type: CHANNEL, name: MemoryChannel started
6:41:16.486 PM INFO org.apache.flume.node.Application
Starting Sink SinkToHBase
6:41:16.486 PM INFO org.apache.flume.node.Application
Starting Source mEEsuresSRC1
6:41:16.577 PM INFO org.mortbay.log
Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
org.mortbay.log.Slf4jLog
6:41:16.856 PM INFO org.mortbay.log
jetty-6.1.26
6:41:16.914 PM INFO org.mortbay.log
Started SocketConnector@0.0.0.0:41414
6:41:18.581 PM INFO org.apache.zookeeper.ZooKeeper
Client environment:zookeeper.version=3.4.5-cdh4.3.0--1, built on 05/20/2013
20:55 GMT
6:41:18.582 PM INFO org.apache.zookeeper.ZooKeeper
Client environment:host.name=myhadoop.cluster
6:41:18.582 PM INFO org.apache.zookeeper.ZooKeeper
Client environment:java.version=1.6.0_31
6:41:18.582 PM INFO org.apache.zookeeper.ZooKeeper
Client environment:java.vendor=Sun Microsystems Inc.
6:41:18.582 PM INFO org.apache.zookeeper.ZooKeeper
Client environment:java.home=/usr/java/jdk1.6.0_31/jre
6:41:18.582 PM INFO org.apache.zookeeper.ZooKeeper
Client environment:java.class.path=/var/run/cloudera-scm-agent/ ... (lots
of stuff)
6:41:18.583 PM INFO org.apache.zookeeper.ZooKeeper
Client
environment:java.library.path=:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/lib/native:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/lib/native:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hbase/bin/../lib/native/Linux-amd64-64
6:41:18.583 PM INFO org.apache.zookeeper.ZooKeeper
Client environment:java.io.tmpdir=/tmp
6:41:18.583 PM INFO org.apache.zookeeper.ZooKeeper
Client environment:java.compiler=<NA>
6:41:18.584 PM INFO org.apache.zookeeper.ZooKeeper
Client environment:os.name=Linux
6:41:18.596 PM INFO org.apache.zookeeper.ZooKeeper
Client environment:os.arch=amd64
6:41:18.596 PM INFO org.apache.zookeeper.ZooKeeper
Client environment:os.version=2.6.32-279.14.1.el6.x86_64
6:41:18.596 PM INFO org.apache.zookeeper.ZooKeeper
Client environment:user.name=flume
6:41:18.596 PM INFO org.apache.zookeeper.ZooKeeper
Client environment:user.home=/var/lib/flume-ng
6:41:18.596 PM INFO org.apache.zookeeper.ZooKeeper
Client
environment:user.dir=/var/run/cloudera-scm-agent/process/3234-flume-AGENT
6:41:18.600 PM INFO org.apache.zookeeper.ZooKeeper
Initiating client connection, connectString=myhadoop.cluster:2181
sessionTimeout=60000 watcher=hconnection
6:41:18.791 PM INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
The identifier of this process is 6845@myhadoop.cluster
6:41:18.836 PM INFO org.apache.zookeeper.ClientCnxn
Opening socket connection to server myhadoop.cluster/11.52.6.180:2181. Will
not attempt to authenticate using SASL (Unable to locate a login
configuration)
6:41:18.856 PM INFO org.apache.zookeeper.ClientCnxn
Socket connection established to myhadoop.cluster/11.52.6.180:2181,
initiating session
6:41:18.881 PM INFO org.apache.zookeeper.ClientCnxn
Session establishment complete on server myhadoop.cluster/11.52.6.180:2181,
sessionid = 0x13f0a901280001a, negotiated timeout = 60000
6:41:19.575 PM WARN org.apache.hadoop.conf.Configuration
hadoop.native.lib is deprecated. Instead, use io.native.lib.available
6:42:17.288 PM WARN org.apache.flume.source.SyslogUtils
Event created from Invalid Syslog data.

Re: Problems with flume + Hbase sink

Posted by Victor Sanchez <vh...@gmail.com>.
Hi that is because I send not proper syslog data so I assumed that the
WARNing is ok. I have a working example with the same Source (syslog) the
same "badly" formed data and sink to hdfs.

To send the data I used NC

$ nc -4 -u myhadoopNN 4444
manual message

Just for trying I changed the source to be netcat (so I wont get the WARN).
Still no success in any of the HBASE sinks to make it work




On Mon, Jun 17, 2013 at 6:02 PM, Alexander Alten-Lorenz <wget.null@gmail.com
> wrote:

> Hi,
>
> > 6:42:17.288 PM        WARN    org.apache.flume.source.SyslogUtils
> > Event created from Invalid Syslog data.
>
> Looks like your syslog source data is malformed.
>
> - Alex
>
>
> --
> Alexander Alten-Lorenz
> http://mapredit.blogspot.com
> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>
>

Re: Problems with flume + Hbase sink

Posted by Alexander Alten-Lorenz <wg...@gmail.com>.
Hi,

> 6:42:17.288 PM	WARN	org.apache.flume.source.SyslogUtils	
> Event created from Invalid Syslog data.

Looks like your syslog source data is malformed.

- Alex


--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF