You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Jonathan Hsieh <jo...@cloudera.com> on 2011/07/21 13:48:35 UTC

Re: HBase Sink

[Please subscribe to new flume-user@incubator.apache.org list, bcc
flume-user@cloudera.org, cc flume-user@incubator.apache.org]

Dennis,

These empty messages are added by the E2E agent.  When using the hbase sink,
it should be used with the collector { } wrapper.  It handles acks delivery
and should remove the empty body messages.

collector(xxx) { hbase(yyy) }

If you are doing this and the "AckType":"end" make it through, there may be
a problem.

Jon.

On Mon, Jul 4, 2011 at 8:10 AM, Dennis <de...@gmail.com> wrote:

> Thanks for your help. Right now I've got a working Log4j -> Agent ->
> Collector -> HBase Setup running. There are still some things to iron
> out, but it seems to work.
>
> Something i've noticed:
>
> The Log4jAppender just sends the MSG Body and Logpriority. No further
> messages like time, date, loglevel and more which are defined in the
> log4jconfig.
> Also I've got many empty between my valid entries within hbase. Notice
> the last entry with an entry body. I suspect it is generated by the
> AgentSink.
>
>  {"body":"1232 342","timestamp":1309789917614,"pri":"WARN","nanos":
> 1309789917614716000,"host":"10.0.0.30","fields":
>
> {"AckTag":"20110704-163155044+0200.4234389357565.00000022","AckType":"msg","AckChecksum":"\u0000\u0000\u0000\u0000¿Tð”"}}
>  {"body":"1232 343","timestamp":1309789919619,"pri":"WARN","nanos":
> 1309789919619332000,"host":"10.0.0.30","fields":
>
> {"AckTag":"20110704-163155044+0200.4234389357565.00000022","AckType":"msg","AckChecksum":"\u0000\u0000\u0000\u0000ÈSÀ
> \u0002"}}
>  {"body":"1232 344","timestamp":1309789921623,"pri":"WARN","nanos":
> 1309789921623857000,"host":"10.0.0.30","fields":
>
> {"AckTag":"20110704-163155044+0200.4234389357565.00000022","AckType":"msg","AckChecksum":"\u0000\u0000\u0000\u0000V7U¡"}}
>  {"body":"1232 345","timestamp":1309789923628,"pri":"WARN","nanos":
> 1309789923628266000,"host":"10.0.0.30","fields":
>
> {"AckTag":"20110704-163155044+0200.4234389357565.00000022","AckType":"msg","AckChecksum":"\u0000\u0000\u0000\u0000!
> 0e7"}}
>  {"body":"","timestamp":1309789925101,"pri":"INFO","nanos":
> 4244446597812,"host":"ubuntu","fields":
>
> {"AckTag":"20110704-163155044+0200.4234389357565.00000022","AckType":"end","AckChecksum":"\u0000\u0000\u00010Óͦ‹"}}
>
> On Jun 28, 10:20 pm, Jonathan Hsieh <j....@cloudera.com> wrote:
> > Dennis, Himanshu,
> >
> > You'd need to pull in the data (lets say via text or tail sources),
>  parse
> > out parts of the line, and then feed it to a hbase sink.
> >
> > It would roughly look like this:
> >
> > node: tail("file") |  regexAll("(\\w+)\\s+(\\w+)", "row", "data")
> > collector(300000) { hbase("table", "%{row}", "cf", "qual", "%{data}") }
> >
> > tail is the source.
> >
> > regexAll (an 0.9.4 feature) would pull out and add a "row" and "data"
> > attribute out of the event's body. if you are using 0.9.3, you'd need to
> > have two regex expression -- one to pull out the "row" and the "data".
> >
> > The hbase sink writes to table "table" using the extracted value of "row"
> as
> > the row key
> >
> > Finally, you'de wrap with a colllector constructor so that it can to
> handle
> > retries on failure (if hbase goes down) and acks (if agents potentially
> go
> > down).
> >
> > Jon.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Sun, Jun 26, 2011 at 12:23 PM, Dennis <de...@gmail.com>
> wrote:
> > > Have you figured out how this is to accomplished? I'm trying to pipe
> > > it through the RegexAll engine to write everything down into an
> > > separate column.
> >
> > > On May 4, 8:10 am, Himanshu <mi...@gmail.com> wrote:
> > > > Hey Jonathan,
> >
> > > > Answer to your question:
> > > > 1. I have just tried for a particular event for which it happens
> every
> > > > time, i have described my event below.
> > > > 2. I am not using metadata as rowKey, I am trying to get part of body
> > > > as rowKey and other part as value.
> >
> > > > I will just try to explain my case to you, I am using text file as
> > > > source,
> > > > source defined as text("file.text") the file has two columns there
> > > > headers
> > > > are 'row' and 'data'. like...
> >
> > > > row       data
> > > > 123       abc
> > > > 234       def
> >
> > > > and I described sink as
> > > > hbase("test","%{row}","cf","qual","%{data}", writeBufferSize="10",
> > > > writeToWal="true")
> >
> > > > here I am trying to insert 'row' column from text file as rowKey and
> > > > 'data' column as value for
> > > > columnFamily. Pls suggest it how can it be done.
> > > >  And one more thing if I want to use metadata data as rowKey say
> 'part
> > > > of date' as you stated
> > > > earlier. How I can do this.
> >
> > > > Thanks & Regards
> > > > Himanshu
> >
> > > > On May 4, 10:40 am, Jonathan Hsieh <j....@cloudera.com> wrote:
> >
> > > > > Hey Himanshu,
> >
> > > > > This version of hbase sink is pretty new so there might be some
> rough
> > > edges.
> >
> > > > > Some questions:
> >
> > > > > Does this happen for all events or just for some?
> >
> > > > > Do you have a metadata field called "row" and another called "data"
> in
> > > your
> > > > > event?   For row, I would guess that you have some sort of regex
> > > extractor
> > > > > adding "%{row}" metadata data, or using the "parts of date"
> extractors.
> > >  The
> > > > > value/"%{data}" part I would think normally would be the body and
> > > contain
> > > > > the "%{body}" escape sequence.
> >
> > > > > Jon.
> >
> > > > > On Tue, May 3, 2011 at 9:47 PM, Himanshu <mi...@gmail.com>
> wrote:
> > > > > > Jonathan,
> >
> > > > > > Hi, thanks again for your earlier help. Now I am having problem
> in
> > > > > > hbaseSink usage. I am trying to write data from text source ,
> text
> > > > > > file contains two columns named as 'row' and 'data'.
> > > > > > And I described hbaseSink as hbase("test","%{row}","cf","qual","%
> > > > > > {data}", writeBufferSize="10", writeToWal="true"). In hbase I
> have
> > > > > > table named 'test' and columnFamily named 'cf'. When I submit
> query
> > > > > > flume generates log as
> > > > > > WARN core.Event: Tag row not found
> > > > > > WARN core.Event: Tag data not found
> >
> > > > > > I think that there is problem in my data format, or might be in
> > > query.
> > > > > > Suggest me where I am wrong.
> >
> > > > > > Thanks & Regards
> > > > > > Himanshu
> >
> > > > > > On May 3, 8:03 pm, Jonathan Hsieh <j....@cloudera.com> wrote:
> > > > > > > Himanshu,
> >
> > > > > > > Currently the nodes and the master both must know about the
> plugin
> > > for it
> > > > > > to
> > > > > > > work (the master will reject a configuration attempt for sinks
> it
> > > doesn't
> > > > > > > know about.).  My guess is that you've loaded the plugin on one
> but
> > > not
> > > > > > the
> > > > > > > other.
> >
> > > > > > > You can check to see if plugins are loaded by checking these
> web
> > > pages
> > > > > > and
> > > > > > > seeing if the pluging sinks are listed :
> >
> > > > > > > http://<node>:35862/extension.jsp
> >
> > > > > > > http://<master>:35871/masterext.jsp
> >
> > > > > > > It would great if you could update documentation to make this
> > > easier for
> > > > > > > folks in the future.  We are transitioning to maven builds and
> some
> > > of
> > > > > > the
> > > > > > > instructions may be out of date!
> >
> > > > > > > Jon.
> >
> > > > > > > <http://localhost:35871/masterext.jsp>
> >
> > > > > > > On Tue, May 3, 2011 at 7:50 AM, Himanshu <mishra1...@gmail.com
> >
> > > wrote:
> > > > > > > > Hi
> > > > > > > > I have seen Flume-414, and trying to put data in hbase
> through
> > > flume
> > > > > > > > using this hbaseSink. I am new to Flume, I performed
> following
> > > steps
> > > > > > > > 1. I got your latest code from git and build by mvn package.
> > > > > > > > 2. I copied hbaseSink jar to lib of distribution package
> > > generated.
> > > > > > > > 3. I made add plugin class name to the flume-site.xml
> > > > > > > >  Now when i start flume master it loads plugin classes but
> when i
> > > try
> > > > > > > > to query in master config it states invalid sink for
> hbaseSink.
> > > > > > > > Please point where I am wrong. Thanks in advance.
> >
> > > > > > > --
> > > > > > > // Jonathan Hsieh (shay)
> > > > > > > // Software Engineer, Cloudera
> > > > > > > // j...@cloudera.com
> >
> > > > > --
> > > > > // Jonathan Hsieh (shay)
> > > > > // Software Engineer, Cloudera
> > > > > // j...@cloudera.com
> >
> > --
> > // Jonathan Hsieh (shay)
> > // Software Engineer, Cloudera
> > // j...@cloudera.com
>



-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com