You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by "Tinte garcia, Miguel Angel" <mi...@atos.net> on 2014/08/01 10:10:21 UTC

RE: Flume to Hbase columns with regexp

The result is the same that I put below, it inserts 2 values into one column:
column3

col1val: firstPart
col2val: This is the first part of the result



From: Jonathan Natkins [mailto:natty@streamsets.com]
Sent: Thursday, July 31, 2014 7:02 PM
To: user@flume.apache.org
Subject: Re: Flume to Hbase columns with regexp

What happens when you change the colNames parameter to col1val,col2val? The line should be:

agent.sinks.hbaseSink.serializer.colNames=col1val,col2val

On Thu, Jul 31, 2014 at 1:45 AM, Tinte garcia, Miguel Angel <mi...@atos.net>> wrote:
Hi Jonathan,
My current configuration is the following:
agent.sinks.hbaseSink.type=hbase
agent.sinks.hbaseSink.channel=memoryChannel
agent.sinks.hbaseSink.table=hbase_table
# filling first column
agent.sinks.hbaseSink.columnFamily=column1
agent.sinks.hbaseSink.batchSize = 5000
# splitting input parameters
agent.sinks.hbaseSink.serializer=org.apache.flume.sink.hbase.RegexHbaseEventSerializer
agent.sinks.hbaseSink.serializer.regex=^[^,]+,(.+),(.+)$
agent.sinks.hbaseSink.serializer.colNames=col1val
# filling second column
#agent.sinks.hbaseSink.columnFamily=column2
# splitting input parameters
#agent.sinks.hbaseSink.serializer=org.apache.flume.sink.hbase.RegexHbaseEventSerializer
#agent.sinks.hbaseSink.serializer.regex=^[^,]+,(.+),(.+)$
#agent.sinks.hbaseSink.serializer.colNames=col2val

As you can see above, I have only been able to put one colName into a columnFamily, the second declaration commented didn’t work. The flume event is generated to store in the HBase table as a POST method with the next information:
http://localhost:8080/flumeEvent/rest/data/inject?colval11=1&colval2=005&colval3=test
With the following content: “This is a test for different columns”

Thanks again


From: Jonathan Natkins [mailto:natty@streamsets.com<ma...@streamsets.com>]
Sent: Thursday, July 31, 2014 12:10 AM

To: user@flume.apache.org<ma...@flume.apache.org>
Subject: Re: Flume to Hbase columns with regexp

Hi Miguel,

What does your configuration look like after you made the initial changes? As far as I can tell, the HbaseSink only has the ability to load into one column family, so declaring two of them probably won't help.

The fact that you're getting different values in the same column leads me to believe that your column values are split across multiple events. Is that accurate? What did the event that produced two values in one column look like?

Thanks,
Natty

On Wed, Jul 30, 2014 at 2:10 AM, Tinte garcia, Miguel Angel <mi...@atos.net>> wrote:
Hi Jonathan,
Thanks for your comments below. This is what I have been able to do so far (result copied from hbasexplorer):
rowkey - timestamp

column1

column2

column3

1406706418563-47PT7nzRvW-0
Show 1 Timestamp

col1val: firstPart
col2val: This is the first part of the result







Therefore, I have been able to split the different colNval tokens (which is great) but I am still unable to store these split tokens into the different hbase table  columns. I have tried it declaring two columnFamily and one value in the subsequent colNames parameter, but it didn’t work.
Is it possible inserting these values into different columns?

Thanks again


From: Jonathan Natkins [mailto:natty@streamsets.com<ma...@streamsets.com>]
Sent: Tuesday, July 29, 2014 1:15 AM
To: user@flume.apache.org<ma...@flume.apache.org>
Subject: Re: Flume to Hbase columns with regexp

Alright, a couple things:

1) It looks like my intuition was correct. Changing your config to be colNames from columns seems to get things working.

2) Based on the description of what you're trying to do, it looks like your regex might be slightly off. For example, if I had a row:

familyName,col1val,col2val

Your regex will result in column1 containing 'familyName', and column2 containing 'col1val,col2val', which I don't think is what you're trying to do. Probably you want to use this regex, or something like it:

^[^,]+,(.+),(.+)$

This regex will result in column1 containing 'col1val', column2 containing 'col2val', and the first value (which appears to be the family name) being thrown away. Is this what you were trying to do?

As an aside, the mechanics of the RegexHbaseEventSerializer are to take the matching groups and map those to the list of column names defined by the colNames config parameter. If you want to toss any data away, just make sure it's not within a set of parentheses.

Let me know if you have any more questions, or if you have trouble getting this to work.

Thanks!
Natty

On Mon, Jul 28, 2014 at 3:48 PM, Jonathan Natkins <na...@streamsets.com>> wrote:
I haven't tested this myself, but a quick look at the code suggests that your column name specification may be configured incorrectly. It looks like it should be:

agent.sinks.hbaseSink.serializer.colNames = column1,column2

I'm trying this out myself, though, so if I find something definitive, I'll let you know.

On Mon, Jul 28, 2014 at 4:19 AM, Tinte garcia, Miguel Angel <mi...@atos.net>> wrote:
Hi,
I am sending a Flume event to insert some information into a concrete HBase table. My flume conf.properties looks like this:
agent.sinks.hbaseSink.table=table_name
agent.sinks.hbaseSink.columnFamily=idColumn
agent.sinks.hbaseSink.serializer=org.apache.flume.sink.hbase.RegexHbaseEventSerializer
agent.sinks.hbaseSink.serializer.regex=^([^,]+),(.+)$
agent.sinks.hbaseSink.serializer.columns = column1,column2

Basically, what I am trying to do is splitting the input values into three different columns:  idColumn,column1,column2
With this configuration, no error is returned but no input is recorded into the table. Any idea about what am I doing wrong?

Thanks in advance









RE: Flume to Hbase columns with regexp

Posted by "Tinte garcia, Miguel Angel" <mi...@atos.net>.
Yes sure, let me explain it more deeply.  What I am trying to do is to split different comma separated values sent by a flume event to store them in separated HBase columns. For instance:  val1,val2,val3,val4 into column1,column2,column3,column4

To do so, I have the next configuration:

agent.sinks.hbaseSink.serializer.regex=^[^,]+,(.+),(.+)$

agent.sinks.hbaseSink.table=hbase_table

agent.sinks.hbaseSink.columnFamily=column1

agent.sinks.hbaseSink.serializer.colNames=val1,val2,val3,val4

What I am looking for is a configuration which allows me to store these values into different columns. With configuration above I am able to store the values only one column like the following example:

Column1

val1: firstPart, val2: This is the first part of the result,  val3: thirdPart, val4:lastValue





I hope it is more clear now.

Thanks

From: Jonathan Natkins [mailto:natty@streamsets.com]
Sent: Friday, August 01, 2014 7:28 PM
To: user@flume.apache.org
Subject: Re: Flume to Hbase columns with regexp

Let's be a little more explicit, since this result doesn't make a lot of sense to me:

What is the value you have configured for agent.sinks.hbaseSink.serializer.colNames and agent.sinks.hbaseSink.serializer.regex, and can you give an example of the exact event that is coming into Flume?

Is it still the same as you described in the previous email, or have you changed these values?

Thanks,
Natty

On Fri, Aug 1, 2014 at 1:10 AM, Tinte garcia, Miguel Angel <mi...@atos.net>> wrote:
The result is the same that I put below, it inserts 2 values into one column:
column3

col1val: firstPart
col2val: This is the first part of the result



From: Jonathan Natkins [mailto:natty@streamsets.com<ma...@streamsets.com>]
Sent: Thursday, July 31, 2014 7:02 PM

To: user@flume.apache.org<ma...@flume.apache.org>
Subject: Re: Flume to Hbase columns with regexp

What happens when you change the colNames parameter to col1val,col2val? The line should be:

agent.sinks.hbaseSink.serializer.colNames=col1val,col2val

On Thu, Jul 31, 2014 at 1:45 AM, Tinte garcia, Miguel Angel <mi...@atos.net>> wrote:
Hi Jonathan,
My current configuration is the following:
agent.sinks.hbaseSink.type=hbase
agent.sinks.hbaseSink.channel=memoryChannel
agent.sinks.hbaseSink.table=hbase_table
# filling first column
agent.sinks.hbaseSink.columnFamily=column1
agent.sinks.hbaseSink.batchSize = 5000
# splitting input parameters
agent.sinks.hbaseSink.serializer=org.apache.flume.sink.hbase.RegexHbaseEventSerializer
agent.sinks.hbaseSink.serializer.regex=^[^,]+,(.+),(.+)$
agent.sinks.hbaseSink.serializer.colNames=col1val
# filling second column
#agent.sinks.hbaseSink.columnFamily=column2
# splitting input parameters
#agent.sinks.hbaseSink.serializer=org.apache.flume.sink.hbase.RegexHbaseEventSerializer
#agent.sinks.hbaseSink.serializer.regex=^[^,]+,(.+),(.+)$
#agent.sinks.hbaseSink.serializer.colNames=col2val

As you can see above, I have only been able to put one colName into a columnFamily, the second declaration commented didn’t work. The flume event is generated to store in the HBase table as a POST method with the next information:
http://localhost:8080/flumeEvent/rest/data/inject?colval11=1&colval2=005&colval3=test
With the following content: “This is a test for different columns”

Thanks again


From: Jonathan Natkins [mailto:natty@streamsets.com<ma...@streamsets.com>]
Sent: Thursday, July 31, 2014 12:10 AM

To: user@flume.apache.org<ma...@flume.apache.org>
Subject: Re: Flume to Hbase columns with regexp

Hi Miguel,

What does your configuration look like after you made the initial changes? As far as I can tell, the HbaseSink only has the ability to load into one column family, so declaring two of them probably won't help.

The fact that you're getting different values in the same column leads me to believe that your column values are split across multiple events. Is that accurate? What did the event that produced two values in one column look like?

Thanks,
Natty

On Wed, Jul 30, 2014 at 2:10 AM, Tinte garcia, Miguel Angel <mi...@atos.net>> wrote:
Hi Jonathan,
Thanks for your comments below. This is what I have been able to do so far (result copied from hbasexplorer):
rowkey - timestamp

column1

column2

column3

1406706418563-47PT7nzRvW-0
Show 1 Timestamp

col1val: firstPart
col2val: This is the first part of the result







Therefore, I have been able to split the different colNval tokens (which is great) but I am still unable to store these split tokens into the different hbase table  columns. I have tried it declaring two columnFamily and one value in the subsequent colNames parameter, but it didn’t work.
Is it possible inserting these values into different columns?

Thanks again


From: Jonathan Natkins [mailto:natty@streamsets.com<ma...@streamsets.com>]
Sent: Tuesday, July 29, 2014 1:15 AM
To: user@flume.apache.org<ma...@flume.apache.org>
Subject: Re: Flume to Hbase columns with regexp

Alright, a couple things:

1) It looks like my intuition was correct. Changing your config to be colNames from columns seems to get things working.

2) Based on the description of what you're trying to do, it looks like your regex might be slightly off. For example, if I had a row:

familyName,col1val,col2val

Your regex will result in column1 containing 'familyName', and column2 containing 'col1val,col2val', which I don't think is what you're trying to do. Probably you want to use this regex, or something like it:

^[^,]+,(.+),(.+)$

This regex will result in column1 containing 'col1val', column2 containing 'col2val', and the first value (which appears to be the family name) being thrown away. Is this what you were trying to do?

As an aside, the mechanics of the RegexHbaseEventSerializer are to take the matching groups and map those to the list of column names defined by the colNames config parameter. If you want to toss any data away, just make sure it's not within a set of parentheses.

Let me know if you have any more questions, or if you have trouble getting this to work.

Thanks!
Natty

On Mon, Jul 28, 2014 at 3:48 PM, Jonathan Natkins <na...@streamsets.com>> wrote:
I haven't tested this myself, but a quick look at the code suggests that your column name specification may be configured incorrectly. It looks like it should be:

agent.sinks.hbaseSink.serializer.colNames = column1,column2

I'm trying this out myself, though, so if I find something definitive, I'll let you know.

On Mon, Jul 28, 2014 at 4:19 AM, Tinte garcia, Miguel Angel <mi...@atos.net>> wrote:
Hi,
I am sending a Flume event to insert some information into a concrete HBase table. My flume conf.properties looks like this:
agent.sinks.hbaseSink.table=table_name
agent.sinks.hbaseSink.columnFamily=idColumn
agent.sinks.hbaseSink.serializer=org.apache.flume.sink.hbase.RegexHbaseEventSerializer
agent.sinks.hbaseSink.serializer.regex=^([^,]+),(.+)$
agent.sinks.hbaseSink.serializer.columns = column1,column2

Basically, what I am trying to do is splitting the input values into three different columns:  idColumn,column1,column2
With this configuration, no error is returned but no input is recorded into the table. Any idea about what am I doing wrong?

Thanks in advance










Re: Flume to Hbase columns with regexp

Posted by Jonathan Natkins <na...@streamsets.com>.
Let's be a little more explicit, since this result doesn't make a lot of
sense to me:

What is the value you have configured for
agent.sinks.hbaseSink.serializer.colNames
and agent.sinks.hbaseSink.serializer.regex, and can you give an example of
the exact event that is coming into Flume?

Is it still the same as you described in the previous email, or have you
changed these values?

Thanks,
Natty


On Fri, Aug 1, 2014 at 1:10 AM, Tinte garcia, Miguel Angel <
miguel.tinte@atos.net> wrote:

>  The result is the same that I put below, it inserts 2 values into one
> column:
>
> *column3*
>
> col1val: firstPart
> col2val: This is the first part of the result
>
>
>
>
>
> *From:* Jonathan Natkins [mailto:natty@streamsets.com]
> *Sent:* Thursday, July 31, 2014 7:02 PM
>
> *To:* user@flume.apache.org
> *Subject:* Re: Flume to Hbase columns with regexp
>
>
>
> What happens when you change the colNames parameter to col1val,col2val?
> The line should be:
>
>
>
> agent.sinks.hbaseSink.serializer.colNames=col1val,col2val
>
>
>
> On Thu, Jul 31, 2014 at 1:45 AM, Tinte garcia, Miguel Angel <
> miguel.tinte@atos.net> wrote:
>
> Hi Jonathan,
>
> My current configuration is the following:
>
> agent.sinks.hbaseSink.type=hbase
>
> agent.sinks.hbaseSink.channel=memoryChannel
>
> agent.sinks.hbaseSink.table=hbase_table
>
> # filling first column
>
> agent.sinks.hbaseSink.columnFamily=column1
>
> agent.sinks.hbaseSink.batchSize = 5000
>
> # splitting input parameters
>
>
> agent.sinks.hbaseSink.serializer=org.apache.flume.sink.hbase.RegexHbaseEventSerializer
>
> agent.sinks.hbaseSink.serializer.regex=^[^,]+,(.+),(.+)$
>
> agent.sinks.hbaseSink.serializer.colNames=col1val
>
> # filling second column
>
> #agent.sinks.hbaseSink.columnFamily=column2
>
> # splitting input parameters
>
>
> #agent.sinks.hbaseSink.serializer=org.apache.flume.sink.hbase.RegexHbaseEventSerializer
>
> #agent.sinks.hbaseSink.serializer.regex=^[^,]+,(.+),(.+)$
>
> #agent.sinks.hbaseSink.serializer.colNames=col2val
>
>
>
> As you can see above, I have only been able to put one colName into a
> columnFamily, the second declaration commented didn’t work. The flume event
> is generated to store in the HBase table as a POST method with the next
> information:
>
>
> http://localhost:8080/flumeEvent/rest/data/inject?colval11=1&colval2=005&colval3=test
>
>
> With the following content: “This is a test for different columns”
>
>
>
> Thanks again
>
>
>
>
>
> *From:* Jonathan Natkins [mailto:natty@streamsets.com]
> *Sent:* Thursday, July 31, 2014 12:10 AM
>
>
> *To:* user@flume.apache.org
> *Subject:* Re: Flume to Hbase columns with regexp
>
>
>
> Hi Miguel,
>
>
>
> What does your configuration look like after you made the initial changes?
> As far as I can tell, the HbaseSink only has the ability to load into one
> column family, so declaring two of them probably won't help.
>
>
>
> The fact that you're getting different values in the same column leads me
> to believe that your column values are split across multiple events. Is
> that accurate? What did the event that produced two values in one column
> look like?
>
>
>
> Thanks,
>
> Natty
>
>
>
> On Wed, Jul 30, 2014 at 2:10 AM, Tinte garcia, Miguel Angel <
> miguel.tinte@atos.net> wrote:
>
> Hi Jonathan,
>
> Thanks for your comments below. This is what I have been able to do so far
> (result copied from hbasexplorer):
>
> *rowkey - timestamp*
>
> *column1*
>
> *column2*
>
> *column3*
>
> 1406706418563-47PT7nzRvW-0
>
> Show 1 Timestamp
>
> col1val: firstPart
> col2val: This is the first part of the result
>
>
>
>
>
>
>
> Therefore, I have been able to split the different colNval tokens (which
> is great) but I am still unable to store these split tokens into the
> different hbase table  columns. I have tried it declaring two columnFamily
> and one value in the subsequent colNames parameter, but it didn’t work.
>
> Is it possible inserting these values into different columns?
>
>
>
> Thanks again
>
>
>
>
>
> *From:* Jonathan Natkins [mailto:natty@streamsets.com]
> *Sent:* Tuesday, July 29, 2014 1:15 AM
> *To:* user@flume.apache.org
> *Subject:* Re: Flume to Hbase columns with regexp
>
>
>
> Alright, a couple things:
>
>
>
> 1) It looks like my intuition was correct. Changing your config to be
> colNames from columns seems to get things working.
>
>
>
> 2) Based on the description of what you're trying to do, it looks like
> your regex might be slightly off. For example, if I had a row:
>
>
>
> familyName,col1val,col2val
>
>
>
> Your regex will result in column1 containing 'familyName', and column2
> containing 'col1val,col2val', which I don't think is what you're trying to
> do. Probably you want to use this regex, or something like it:
>
>
>
> ^[^,]+,(.+),(.+)$
>
>
>
> This regex will result in column1 containing 'col1val', column2 containing
> 'col2val', and the first value (which appears to be the family name) being
> thrown away. Is this what you were trying to do?
>
>
>
> As an aside, the mechanics of the RegexHbaseEventSerializer are to take
> the matching groups and map those to the list of column names defined by
> the colNames config parameter. If you want to toss any data away, just make
> sure it's not within a set of parentheses.
>
>
>
> Let me know if you have any more questions, or if you have trouble getting
> this to work.
>
>
>
> Thanks!
>
> Natty
>
>
>
> On Mon, Jul 28, 2014 at 3:48 PM, Jonathan Natkins <na...@streamsets.com>
> wrote:
>
> I haven't tested this myself, but a quick look at the code suggests that
> your column name specification may be configured incorrectly. It looks like
> it should be:
>
>
>
> agent.sinks.hbaseSink.serializer.colNames = column1,column2
>
>
>
> I'm trying this out myself, though, so if I find something definitive,
> I'll let you know.
>
>
>
> On Mon, Jul 28, 2014 at 4:19 AM, Tinte garcia, Miguel Angel <
> miguel.tinte@atos.net> wrote:
>
> Hi,
>
> I am sending a Flume event to insert some information into a concrete
> HBase table. My flume conf.properties looks like this:
>
> agent.sinks.hbaseSink.table=table_name
>
> agent.sinks.hbaseSink.columnFamily=idColumn
>
>
> agent.sinks.hbaseSink.serializer=org.apache.flume.sink.hbase.RegexHbaseEventSerializer
>
> agent.sinks.hbaseSink.serializer.regex=^([^,]+),(.+)$
>
> agent.sinks.hbaseSink.serializer.columns = column1,column2
>
>
>
> Basically, what I am trying to do is splitting the input values into three
> different columns:  idColumn,column1,column2
>
> With this configuration, no error is returned but no input is recorded
> into the table. Any idea about what am I doing wrong?
>
>
>
> Thanks in advance
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>