You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flume.apache.org by Jeremy Karlson <je...@gmail.com> on 2013/11/27 05:52:01 UTC

Generic JDBC Sink

Is there any interest in a generic JDBC sink?

Over the few days I decided to try and write one.  I have something that requires more testing, but seems to be working.

Since the config file is how you’d interact with it, here’s a working example from my source tree:

a.sinks.k.type=jdbc
a.sinks.k.channel=c
a.sinks.k.driver=com.mysql.jdbc.Driver
a.sinks.k.url=jdbc:mysql://localhost:8889/flume
a.sinks.k.user=username
a.sinks.k.password=password
a.sinks.k.batchSize=100
a.sinks.k.sql=insert into twitter (body, timestamp) values (${body:string}, ${header.timestamp:long})

The interesting part is the SQL statement.  You can put anything you want in there - it will get converted to a prepared statement on execution.  The Ant-ish tokens get parsed and replaced with parameters at startup.

The tokens are three part.  For example, in:

${body:string(UTF-8)}

The first is a place in the event to get the value from (“body”, “header.foo”, or “custom”).  The second part ("string") is a type identifier that converts into an appropriate JDBC parameter.  The third part (“UTF-8") is a configuration string for that type, if needed.  As for types, so far I’ve defined:

body: string (with optional charset encoding), bytearray
header: string, long, int, float, double, date (with mandatory date format and optional timezone)

Additionally, if none of those make you happy you can define you own parameter converters:

${custom:com.company.foo.MyConverter(optionaltextconfig)}

I know there is still improvement to be made, but I’d like to get some feedback, bug fixes, and maybe get it included before I do a bunch of useless work.  If there is interest, how would you like it for review or inclusion?

 -- Jeremy

Re: Generic JDBC Sink

Posted by Hari Shreedharan <hs...@cloudera.com>.

Haha, no. I have just been swamped to start a meaningful discussion. If you do have time, you can do that too - you don’t need to wait for me to start it.  


Thanks,
Hari


On Thursday, December 12, 2013 at 9:40 AM, Jeremy Karlson wrote:

> Did I fall asleep at the wheel for a bit and miss the discussion on
> contributed sources / sinks?
>  
> -- Jeremy
>  
>  
>  
> On Thu, Nov 28, 2013 at 11:04 AM, Hari Shreedharan <
> hshreedharan@cloudera.com (mailto:hshreedharan@cloudera.com)> wrote:
>  
> > I think we could add this to flume as a contrib module (rather than in core
> > flume itself). At this time, there is no contrib module yet, but I will
> > start a discussion on this early next week on the dev list and let's take
> > it from there.
> >  
> >  
> > Hari
> >  
> > On Thursday, November 28, 2013, Jeremy Karlson wrote:
> >  
> > > I suppose that really depends on the usage scenario. There are a hundred
> > > things that may affect the ability of the Flume chain to keep up with
> > > incoming data, only one of which is the sink being a JDBC connection. I
> > > think for cases like mine where the data is structured and of a
> > >  
> >  
> > reasonable
> > > volume, a JDBC connection makes sense.
> > >  
> > > I guess what I'm saying is that if someone uses it without thinking or
> > > testing what they're doing with it... That's not a problem with JDBC,
> > >  
> >  
> > the
> > > sink, or Flume. It's a problem with the operator. :-P
> > >  
> > > -- Jeremy
> > >  
> > >  
> > >  
> > > On Thu, Nov 28, 2013 at 8:33 AM, Steve Morin <steve@stevemorin.com (mailto:steve@stevemorin.com)
> > <javascript:;>>
> > > wrote:
> > >  
> > > > Think the biggest problem is not that people wouldn't want to use it
> > but
> > > > that data wouldn't be written fast enough to DB's to clear channels in
> > >  
> > > many
> > > > moderate volumes.
> > > >  
> > > > I'll follow the ticket thanks
> > > >  
> > > >  
> > > > On Thu, Nov 28, 2013 at 8:17 AM, Jeremy Karlson <
> > jeremykarlson@gmail.com (mailto:jeremykarlson@gmail.com)<javascript:;>
> > > > wrote:
> > > >  
> > > > > Hi Steve,
> > > > >  
> > > > > I’ve submitted the sink for review here:
> > > > >  
> > > > > http://issues.apache.org/jira/browse/FLUME-2256
> > > > >  
> > > > > If it’s something that interests you, I encourage you to apply the
> > patch
> > > > > and let me know if it meets your needs or if you find problems.
> > > > >  
> > > > > So far, no movement on it… But it’s only been a couple of days. If
> > > > > Flume doesn’t want it (for whatever reason) I’ll just take off all of
> > > > >  
> > > >  
> > >  
> > > the
> > > > > Apache headers and put it up on GitHub with a similar license. It’ll
> > > >  
> > >  
> > > get
> > > > > open sourced one way or another, but I think folding it into Flume
> > > >  
> > >  
> > >  
> >  
> > makes
> > > > > the most sense.
> > > > >  
> > > > > -- Jeremy
> > > > >  
> > > > >  
> > > > > On Nov 28, 2013, at 7:39, Steve Morin <steve@stevemorin.com (mailto:steve@stevemorin.com)
> > <javascript:;>>
> > > wrote:
> > > > >  
> > > > > Jeremy,
> > > > > I am interested in a JDBC flume sink are you open sourcing it?
> > > > > -Steve
> > > > >  
> > > > >  
> > > > > On Tue, Nov 26, 2013 at 8:52 PM, Jeremy Karlson <
> > > jeremykarlson@gmail.com (mailto:jeremykarlson@gmail.com) <javascript:;>>wrote:
> > > > >  
> > > > > > Is there any interest in a generic JDBC sink?
> > > > > >  
> > > > > > Over the few days I decided to try and write one. I have something
> > > that
> > > > > > requires more testing, but seems to be working.
> > > > > >  
> > > > > > Since the config file is how you’d interact with it, here’s a working
> > > > > > example from my source tree:
> > > > > >  
> > > > > > a.sinks.k.type=jdbc
> > > > > > a.sinks.k.channel=c
> > > > > > a.sinks.k.driver=com.mysql.jdbc.Driver
> > > > > > a.sinks.k.url=jdbc:mysql://localhost:8889/flume
> > > > > > a.sinks.k.user=username
> > > > > > a.sinks.k.password=password
> > > > > > a.sinks.k.batchSize=100
> > > > > > a.sinks.k.sql=insert into twitter (body, timestamp) values
> > > > > > (${body:string}, ${header.timestamp:long})
> > > > > >  
> > > > > > The interesting part is the SQL statement. You can put anything you
> > > > > > want in there - it will get converted to a prepared statement on
> > > > > >  
> > > > >  
> > > >  
> > >  
> > > execution.
> > > > > > The Ant-ish tokens get parsed and replaced with parameters at
> > > > >  
> > > >  
> > >  
> > >  
> >  
> > startup.
> > > > > >  
> > > > > > The tokens are three part. For example, in:
> > > > > >  
> > > > > > ${body:string(UTF-8)}
> > > > > >  
> > > > > > The first is a place in the event to get the value from (“body”,
> > > > > > “header.foo”, or “custom”). The second part ("string") is a type
> > > > > > identifier that converts into an appropriate JDBC parameter. The
> > > > > >  
> > > > >  
> > > >  
> > >  
> >  
> > third
> > > > > > part (“UTF-8") is a configuration string for that type, if needed.
> > > > >  
> > > >  
> > >  
> >  
> > As
> > > for
> > > > > > types, so far I’ve defined:
> > > > > >  
> > > > > > body: string (with optional charset encoding), bytearray
> > > > > > header: string, long, int, float, double, date (with mandatory date
> > > > > > format and optional timezone)
> > > > > >  
> > > > > > Additionally, if none of those make you happy you can define you own
> > > > > > parameter converters:
> > > > > >  
> > > > > > ${custom:com.company.foo.MyConverter(optionaltextconfig)}
> > > > > >  
> > > > > > I know there is still improvement to be made, but I’d like to get
> > some
> > > > > > feedback, bug fixes, and maybe get it included before I do a bunch of
> > > > > > useless work. If there is interest, how would you like it for review
> > > > > >  
> > > > >  
> > > >  
> > >  
> > > or
> > > > > > inclusion?
> > > > > >  
> > > > > > -- Jeremy

Re: Generic JDBC Sink

Posted by Jeremy Karlson <je...@gmail.com>.

Did I fall asleep at the wheel for a bit and miss the discussion on
contributed sources / sinks?

-- Jeremy



On Thu, Nov 28, 2013 at 11:04 AM, Hari Shreedharan <
hshreedharan@cloudera.com> wrote:

> I think we could add this to flume as a contrib module (rather than in core
> flume itself). At this time, there is no contrib module yet, but I will
> start a discussion on this early next week on the dev list and let's take
> it from there.
>
>
> Hari
>
> On Thursday, November 28, 2013, Jeremy Karlson wrote:
>
> > I suppose that really depends on the usage scenario.  There are a hundred
> > things that may affect the ability of the Flume chain to keep up with
> > incoming data, only one of which is the sink being a JDBC connection.  I
> > think for cases like mine where the data is structured and of a
> reasonable
> > volume, a JDBC connection makes sense.
> >
> > I guess what I'm saying is that if someone uses it without thinking or
> > testing what they're doing with it...  That's not a problem with JDBC,
> the
> > sink, or Flume.  It's a problem with the operator.  :-P
> >
> > -- Jeremy
> >
> >
> >
> > On Thu, Nov 28, 2013 at 8:33 AM, Steve Morin <steve@stevemorin.com
> <javascript:;>>
> > wrote:
> >
> > > Think the biggest problem is not that people wouldn't want to use it
> but
> > > that data wouldn't be written fast enough to DB's to clear channels in
> > many
> > > moderate volumes.
> > >
> > > I'll follow the ticket thanks
> > >
> > >
> > > On Thu, Nov 28, 2013 at 8:17 AM, Jeremy Karlson <
> jeremykarlson@gmail.com<javascript:;>
> > >wrote:
> > >
> > >> Hi Steve,
> > >>
> > >> I’ve submitted the sink for review here:
> > >>
> > >> http://issues.apache.org/jira/browse/FLUME-2256
> > >>
> > >> If it’s something that interests you, I encourage you to apply the
> patch
> > >> and let me know if it meets your needs or if you find problems.
> > >>
> > >> So far, no movement on it…  But it’s only been a couple of days.  If
> > >> Flume doesn’t want it (for whatever reason) I’ll just take off all of
> > the
> > >> Apache headers and put it up on GitHub with a similar license.  It’ll
> > get
> > >> open sourced one way or another, but I think folding it into Flume
> makes
> > >> the most sense.
> > >>
> > >> -- Jeremy
> > >>
> > >>
> > >> On Nov 28, 2013, at 7:39, Steve Morin <steve@stevemorin.com
> <javascript:;>>
> > wrote:
> > >>
> > >> Jeremy,
> > >>   I am interested in a JDBC flume sink are you open sourcing it?
> > >> -Steve
> > >>
> > >>
> > >> On Tue, Nov 26, 2013 at 8:52 PM, Jeremy Karlson <
> > jeremykarlson@gmail.com <javascript:;>>wrote:
> > >>
> > >>> Is there any interest in a generic JDBC sink?
> > >>>
> > >>> Over the few days I decided to try and write one.  I have something
> > that
> > >>> requires more testing, but seems to be working.
> > >>>
> > >>> Since the config file is how you’d interact with it, here’s a working
> > >>> example from my source tree:
> > >>>
> > >>> a.sinks.k.type=jdbc
> > >>> a.sinks.k.channel=c
> > >>> a.sinks.k.driver=com.mysql.jdbc.Driver
> > >>> a.sinks.k.url=jdbc:mysql://localhost:8889/flume
> > >>> a.sinks.k.user=username
> > >>> a.sinks.k.password=password
> > >>> a.sinks.k.batchSize=100
> > >>> a.sinks.k.sql=insert into twitter (body, timestamp) values
> > >>> (${body:string}, ${header.timestamp:long})
> > >>>
> > >>> The interesting part is the SQL statement.  You can put anything you
> > >>> want in there - it will get converted to a prepared statement on
> > execution.
> > >>>  The Ant-ish tokens get parsed and replaced with parameters at
> startup.
> > >>>
> > >>> The tokens are three part.  For example, in:
> > >>>
> > >>> ${body:string(UTF-8)}
> > >>>
> > >>> The first is a place in the event to get the value from (“body”,
> > >>> “header.foo”, or “custom”).  The second part ("string") is a type
> > >>> identifier that converts into an appropriate JDBC parameter.  The
> third
> > >>> part (“UTF-8") is a configuration string for that type, if needed.
>  As
> > for
> > >>> types, so far I’ve defined:
> > >>>
> > >>> body: string (with optional charset encoding), bytearray
> > >>> header: string, long, int, float, double, date (with mandatory date
> > >>> format and optional timezone)
> > >>>
> > >>> Additionally, if none of those make you happy you can define you own
> > >>> parameter converters:
> > >>>
> > >>> ${custom:com.company.foo.MyConverter(optionaltextconfig)}
> > >>>
> > >>> I know there is still improvement to be made, but I’d like to get
> some
> > >>> feedback, bug fixes, and maybe get it included before I do a bunch of
> > >>> useless work.  If there is interest, how would you like it for review
> > or
> > >>> inclusion?
> > >>>
> > >>>  -- Jeremy
> > >>>
> > >>>
> > >>>
> > >>
> > >>
> > >
> >
>

Re: Generic JDBC Sink

Posted by Hari Shreedharan <hs...@cloudera.com>.

I think we could add this to flume as a contrib module (rather than in core
flume itself). At this time, there is no contrib module yet, but I will
start a discussion on this early next week on the dev list and let's take
it from there.


Hari

On Thursday, November 28, 2013, Jeremy Karlson wrote:

> I suppose that really depends on the usage scenario.  There are a hundred
> things that may affect the ability of the Flume chain to keep up with
> incoming data, only one of which is the sink being a JDBC connection.  I
> think for cases like mine where the data is structured and of a reasonable
> volume, a JDBC connection makes sense.
>
> I guess what I'm saying is that if someone uses it without thinking or
> testing what they're doing with it...  That's not a problem with JDBC, the
> sink, or Flume.  It's a problem with the operator.  :-P
>
> -- Jeremy
>
>
>
> On Thu, Nov 28, 2013 at 8:33 AM, Steve Morin <steve@stevemorin.com<javascript:;>>
> wrote:
>
> > Think the biggest problem is not that people wouldn't want to use it but
> > that data wouldn't be written fast enough to DB's to clear channels in
> many
> > moderate volumes.
> >
> > I'll follow the ticket thanks
> >
> >
> > On Thu, Nov 28, 2013 at 8:17 AM, Jeremy Karlson <jeremykarlson@gmail.com<javascript:;>
> >wrote:
> >
> >> Hi Steve,
> >>
> >> I’ve submitted the sink for review here:
> >>
> >> http://issues.apache.org/jira/browse/FLUME-2256
> >>
> >> If it’s something that interests you, I encourage you to apply the patch
> >> and let me know if it meets your needs or if you find problems.
> >>
> >> So far, no movement on it…  But it’s only been a couple of days.  If
> >> Flume doesn’t want it (for whatever reason) I’ll just take off all of
> the
> >> Apache headers and put it up on GitHub with a similar license.  It’ll
> get
> >> open sourced one way or another, but I think folding it into Flume makes
> >> the most sense.
> >>
> >> -- Jeremy
> >>
> >>
> >> On Nov 28, 2013, at 7:39, Steve Morin <steve@stevemorin.com<javascript:;>>
> wrote:
> >>
> >> Jeremy,
> >>   I am interested in a JDBC flume sink are you open sourcing it?
> >> -Steve
> >>
> >>
> >> On Tue, Nov 26, 2013 at 8:52 PM, Jeremy Karlson <
> jeremykarlson@gmail.com <javascript:;>>wrote:
> >>
> >>> Is there any interest in a generic JDBC sink?
> >>>
> >>> Over the few days I decided to try and write one.  I have something
> that
> >>> requires more testing, but seems to be working.
> >>>
> >>> Since the config file is how you’d interact with it, here’s a working
> >>> example from my source tree:
> >>>
> >>> a.sinks.k.type=jdbc
> >>> a.sinks.k.channel=c
> >>> a.sinks.k.driver=com.mysql.jdbc.Driver
> >>> a.sinks.k.url=jdbc:mysql://localhost:8889/flume
> >>> a.sinks.k.user=username
> >>> a.sinks.k.password=password
> >>> a.sinks.k.batchSize=100
> >>> a.sinks.k.sql=insert into twitter (body, timestamp) values
> >>> (${body:string}, ${header.timestamp:long})
> >>>
> >>> The interesting part is the SQL statement.  You can put anything you
> >>> want in there - it will get converted to a prepared statement on
> execution.
> >>>  The Ant-ish tokens get parsed and replaced with parameters at startup.
> >>>
> >>> The tokens are three part.  For example, in:
> >>>
> >>> ${body:string(UTF-8)}
> >>>
> >>> The first is a place in the event to get the value from (“body”,
> >>> “header.foo”, or “custom”).  The second part ("string") is a type
> >>> identifier that converts into an appropriate JDBC parameter.  The third
> >>> part (“UTF-8") is a configuration string for that type, if needed.  As
> for
> >>> types, so far I’ve defined:
> >>>
> >>> body: string (with optional charset encoding), bytearray
> >>> header: string, long, int, float, double, date (with mandatory date
> >>> format and optional timezone)
> >>>
> >>> Additionally, if none of those make you happy you can define you own
> >>> parameter converters:
> >>>
> >>> ${custom:com.company.foo.MyConverter(optionaltextconfig)}
> >>>
> >>> I know there is still improvement to be made, but I’d like to get some
> >>> feedback, bug fixes, and maybe get it included before I do a bunch of
> >>> useless work.  If there is interest, how would you like it for review
> or
> >>> inclusion?
> >>>
> >>>  -- Jeremy
> >>>
> >>>
> >>>
> >>
> >>
> >
>

Re: Generic JDBC Sink

Posted by Jeremy Karlson <je...@gmail.com>.

I suppose that really depends on the usage scenario.  There are a hundred
things that may affect the ability of the Flume chain to keep up with
incoming data, only one of which is the sink being a JDBC connection.  I
think for cases like mine where the data is structured and of a reasonable
volume, a JDBC connection makes sense.

I guess what I'm saying is that if someone uses it without thinking or
testing what they're doing with it...  That's not a problem with JDBC, the
sink, or Flume.  It's a problem with the operator.  :-P

-- Jeremy



On Thu, Nov 28, 2013 at 8:33 AM, Steve Morin <st...@stevemorin.com> wrote:

> Think the biggest problem is not that people wouldn't want to use it but
> that data wouldn't be written fast enough to DB's to clear channels in many
> moderate volumes.
>
> I'll follow the ticket thanks
>
>
> On Thu, Nov 28, 2013 at 8:17 AM, Jeremy Karlson <je...@gmail.com>wrote:
>
>> Hi Steve,
>>
>> I’ve submitted the sink for review here:
>>
>> http://issues.apache.org/jira/browse/FLUME-2256
>>
>> If it’s something that interests you, I encourage you to apply the patch
>> and let me know if it meets your needs or if you find problems.
>>
>> So far, no movement on it…  But it’s only been a couple of days.  If
>> Flume doesn’t want it (for whatever reason) I’ll just take off all of the
>> Apache headers and put it up on GitHub with a similar license.  It’ll get
>> open sourced one way or another, but I think folding it into Flume makes
>> the most sense.
>>
>> -- Jeremy
>>
>>
>> On Nov 28, 2013, at 7:39, Steve Morin <st...@stevemorin.com> wrote:
>>
>> Jeremy,
>>   I am interested in a JDBC flume sink are you open sourcing it?
>> -Steve
>>
>>
>> On Tue, Nov 26, 2013 at 8:52 PM, Jeremy Karlson <je...@gmail.com>wrote:
>>
>>> Is there any interest in a generic JDBC sink?
>>>
>>> Over the few days I decided to try and write one.  I have something that
>>> requires more testing, but seems to be working.
>>>
>>> Since the config file is how you’d interact with it, here’s a working
>>> example from my source tree:
>>>
>>> a.sinks.k.type=jdbc
>>> a.sinks.k.channel=c
>>> a.sinks.k.driver=com.mysql.jdbc.Driver
>>> a.sinks.k.url=jdbc:mysql://localhost:8889/flume
>>> a.sinks.k.user=username
>>> a.sinks.k.password=password
>>> a.sinks.k.batchSize=100
>>> a.sinks.k.sql=insert into twitter (body, timestamp) values
>>> (${body:string}, ${header.timestamp:long})
>>>
>>> The interesting part is the SQL statement.  You can put anything you
>>> want in there - it will get converted to a prepared statement on execution.
>>>  The Ant-ish tokens get parsed and replaced with parameters at startup.
>>>
>>> The tokens are three part.  For example, in:
>>>
>>> ${body:string(UTF-8)}
>>>
>>> The first is a place in the event to get the value from (“body”,
>>> “header.foo”, or “custom”).  The second part ("string") is a type
>>> identifier that converts into an appropriate JDBC parameter.  The third
>>> part (“UTF-8") is a configuration string for that type, if needed.  As for
>>> types, so far I’ve defined:
>>>
>>> body: string (with optional charset encoding), bytearray
>>> header: string, long, int, float, double, date (with mandatory date
>>> format and optional timezone)
>>>
>>> Additionally, if none of those make you happy you can define you own
>>> parameter converters:
>>>
>>> ${custom:com.company.foo.MyConverter(optionaltextconfig)}
>>>
>>> I know there is still improvement to be made, but I’d like to get some
>>> feedback, bug fixes, and maybe get it included before I do a bunch of
>>> useless work.  If there is interest, how would you like it for review or
>>> inclusion?
>>>
>>>  -- Jeremy
>>>
>>>
>>>
>>
>>
>

Re: Generic JDBC Sink

Posted by Steve Morin <st...@stevemorin.com>.

Think the biggest problem is not that people wouldn't want to use it but
that data wouldn't be written fast enough to DB's to clear channels in many
moderate volumes.

I'll follow the ticket thanks


On Thu, Nov 28, 2013 at 8:17 AM, Jeremy Karlson <je...@gmail.com>wrote:

> Hi Steve,
>
> I’ve submitted the sink for review here:
>
> http://issues.apache.org/jira/browse/FLUME-2256
>
> If it’s something that interests you, I encourage you to apply the patch
> and let me know if it meets your needs or if you find problems.
>
> So far, no movement on it…  But it’s only been a couple of days.  If Flume
> doesn’t want it (for whatever reason) I’ll just take off all of the Apache
> headers and put it up on GitHub with a similar license.  It’ll get open
> sourced one way or another, but I think folding it into Flume makes the
> most sense.
>
> -- Jeremy
>
>
> On Nov 28, 2013, at 7:39, Steve Morin <st...@stevemorin.com> wrote:
>
> Jeremy,
>   I am interested in a JDBC flume sink are you open sourcing it?
> -Steve
>
>
> On Tue, Nov 26, 2013 at 8:52 PM, Jeremy Karlson <je...@gmail.com>wrote:
>
>> Is there any interest in a generic JDBC sink?
>>
>> Over the few days I decided to try and write one.  I have something that
>> requires more testing, but seems to be working.
>>
>> Since the config file is how you’d interact with it, here’s a working
>> example from my source tree:
>>
>> a.sinks.k.type=jdbc
>> a.sinks.k.channel=c
>> a.sinks.k.driver=com.mysql.jdbc.Driver
>> a.sinks.k.url=jdbc:mysql://localhost:8889/flume
>> a.sinks.k.user=username
>> a.sinks.k.password=password
>> a.sinks.k.batchSize=100
>> a.sinks.k.sql=insert into twitter (body, timestamp) values
>> (${body:string}, ${header.timestamp:long})
>>
>> The interesting part is the SQL statement.  You can put anything you want
>> in there - it will get converted to a prepared statement on execution.  The
>> Ant-ish tokens get parsed and replaced with parameters at startup.
>>
>> The tokens are three part.  For example, in:
>>
>> ${body:string(UTF-8)}
>>
>> The first is a place in the event to get the value from (“body”,
>> “header.foo”, or “custom”).  The second part ("string") is a type
>> identifier that converts into an appropriate JDBC parameter.  The third
>> part (“UTF-8") is a configuration string for that type, if needed.  As for
>> types, so far I’ve defined:
>>
>> body: string (with optional charset encoding), bytearray
>> header: string, long, int, float, double, date (with mandatory date
>> format and optional timezone)
>>
>> Additionally, if none of those make you happy you can define you own
>> parameter converters:
>>
>> ${custom:com.company.foo.MyConverter(optionaltextconfig)}
>>
>> I know there is still improvement to be made, but I’d like to get some
>> feedback, bug fixes, and maybe get it included before I do a bunch of
>> useless work.  If there is interest, how would you like it for review or
>> inclusion?
>>
>>  -- Jeremy
>>
>>
>>
>
>

Re: Generic JDBC Sink

Posted by Jeremy Karlson <je...@gmail.com>.

Hi Steve,

I’ve submitted the sink for review here:

http://issues.apache.org/jira/browse/FLUME-2256

If it’s something that interests you, I encourage you to apply the patch and let me know if it meets your needs or if you find problems.

So far, no movement on it…  But it’s only been a couple of days.  If Flume doesn’t want it (for whatever reason) I’ll just take off all of the Apache headers and put it up on GitHub with a similar license.  It’ll get open sourced one way or another, but I think folding it into Flume makes the most sense.

-- Jeremy


On Nov 28, 2013, at 7:39, Steve Morin <st...@stevemorin.com> wrote:

> Jeremy,
>   I am interested in a JDBC flume sink are you open sourcing it?
> -Steve
> 
> 
> On Tue, Nov 26, 2013 at 8:52 PM, Jeremy Karlson <je...@gmail.com> wrote:
> Is there any interest in a generic JDBC sink?
> 
> Over the few days I decided to try and write one.  I have something that requires more testing, but seems to be working.
> 
> Since the config file is how you’d interact with it, here’s a working example from my source tree:
> 
> a.sinks.k.type=jdbc
> a.sinks.k.channel=c
> a.sinks.k.driver=com.mysql.jdbc.Driver
> a.sinks.k.url=jdbc:mysql://localhost:8889/flume
> a.sinks.k.user=username
> a.sinks.k.password=password
> a.sinks.k.batchSize=100
> a.sinks.k.sql=insert into twitter (body, timestamp) values (${body:string}, ${header.timestamp:long})
> 
> The interesting part is the SQL statement.  You can put anything you want in there - it will get converted to a prepared statement on execution.  The Ant-ish tokens get parsed and replaced with parameters at startup.
> 
> The tokens are three part.  For example, in:
> 
> ${body:string(UTF-8)}
> 
> The first is a place in the event to get the value from (“body”, “header.foo”, or “custom”).  The second part ("string") is a type identifier that converts into an appropriate JDBC parameter.  The third part (“UTF-8") is a configuration string for that type, if needed.  As for types, so far I’ve defined:
> 
> body: string (with optional charset encoding), bytearray
> header: string, long, int, float, double, date (with mandatory date format and optional timezone)
> 
> Additionally, if none of those make you happy you can define you own parameter converters:
> 
> ${custom:com.company.foo.MyConverter(optionaltextconfig)}
> 
> I know there is still improvement to be made, but I’d like to get some feedback, bug fixes, and maybe get it included before I do a bunch of useless work.  If there is interest, how would you like it for review or inclusion?
> 
>  -- Jeremy
> 
> 
>

Re: Generic JDBC Sink

Posted by Steve Morin <st...@stevemorin.com>.

Jeremy,
  I am interested in a JDBC flume sink are you open sourcing it?
-Steve


On Tue, Nov 26, 2013 at 8:52 PM, Jeremy Karlson <je...@gmail.com>wrote:

> Is there any interest in a generic JDBC sink?
>
> Over the few days I decided to try and write one.  I have something that
> requires more testing, but seems to be working.
>
> Since the config file is how you’d interact with it, here’s a working
> example from my source tree:
>
> a.sinks.k.type=jdbc
> a.sinks.k.channel=c
> a.sinks.k.driver=com.mysql.jdbc.Driver
> a.sinks.k.url=jdbc:mysql://localhost:8889/flume
> a.sinks.k.user=username
> a.sinks.k.password=password
> a.sinks.k.batchSize=100
> a.sinks.k.sql=insert into twitter (body, timestamp) values
> (${body:string}, ${header.timestamp:long})
>
> The interesting part is the SQL statement.  You can put anything you want
> in there - it will get converted to a prepared statement on execution.  The
> Ant-ish tokens get parsed and replaced with parameters at startup.
>
> The tokens are three part.  For example, in:
>
> ${body:string(UTF-8)}
>
> The first is a place in the event to get the value from (“body”,
> “header.foo”, or “custom”).  The second part ("string") is a type
> identifier that converts into an appropriate JDBC parameter.  The third
> part (“UTF-8") is a configuration string for that type, if needed.  As for
> types, so far I’ve defined:
>
> body: string (with optional charset encoding), bytearray
> header: string, long, int, float, double, date (with mandatory date format
> and optional timezone)
>
> Additionally, if none of those make you happy you can define you own
> parameter converters:
>
> ${custom:com.company.foo.MyConverter(optionaltextconfig)}
>
> I know there is still improvement to be made, but I’d like to get some
> feedback, bug fixes, and maybe get it included before I do a bunch of
> useless work.  If there is interest, how would you like it for review or
> inclusion?
>
>  -- Jeremy
>
>
>