You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Santiago Mola <sm...@stratio.com> on 2014/09/23 15:14:00 UTC

What Source/Sink would you want next?

Hi all,

I'm developer at Stratio, where I'm part of the development of Stratio
Ingestion. [1] Stratio Ingestion is a distribution of Apache Flume "on
steroids" with many extra sources, sinks, morphlines, etc. We would like to
start contributing back to Apache Flume as much as we can. So we would like
to know what's the community interest on each plugin so we can prioritize
our efforts when contributing code to Apache Flume.

You can check all components at our GitHub page [1], but here's a summary:

Sinks:

- Cassandra (driver 2.0.2) supporting custom or automatic mapping
- MongoDB (driver 2.12) supporting custom or automatic mapping
- JDBC supporting custom queries or automatic mapping
- Kafka (0.8), now obsolete with Flume 1.6.0
- Stratio Streaming

Sources:

- Flume Statistics (consumes Flume agent statistics exposed through its
monitoring REST API).
- Redis PubSub
- REST Client
- SNMP Traps (v1, v2c and v3)

Deserializers:

- XML XPath deserializer


Please, let me know if any of these components is useful to you. I'd love
to hear further information about use cases and specific needs.

Thank you.

Best,
-- 

Santiago M. Mola


<http://www.stratio.com/>
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*

Re: What Source/Sink would you want next?

Posted by Hari Shreedharan <hs...@cloudera.com>.
Strong +1. If there are new features or fixes you want to contribute - please do, I am trying to make sure I spend some time reviewing and committing. If you don’t get a review in a few days, please ping on the jira - I will look at it!


Thanks,
Hari

On Fri, Sep 26, 2014 at 11:07 AM, Arvind Prabhakar <ar...@apache.org>
wrote:

> (cross-posting this to dev@)
> While I do not speak for the availability of other committers of the
> project, I would like to spend sometime with the contributors to help
> identify what are the most important needs of the project, and see how best
> we can get those committed into the codebase.
> Santiago (and others who would like to contribute) - please go ahead and
> create the necessary Jiras if they do not exist already, and invite the
> community to vote on those. That way we can prioritize the review and
> commit for functionality that is aligned with community requirements.
> Regards,
> Arvind Prabhakar
> On Fri, Sep 26, 2014 at 5:13 AM, jean garutti <la...@yahoo.fr> wrote:
>> hi
>> This seems to be great.
>> I'll wait to have the 'production ready' flag for ELS mapping patch.
>> I think more effort should be done to have this sink more configurable
>> like what we can do with logstash.
>>
>> anyway it's nice to share your development to the community
>> i'd love to have the mongodb sink packaged in the official flume release.
>>
>> jean
>>
>>
>>   Le Jeudi 25 septembre 2014 9h48, Santiago Mola <sm...@stratio.com> a
>> écrit :
>>
>>
>> Hi Jean,
>>
>> 2014-09-24 22:44 GMT+02:00 Jean <la...@yahoo.fr>:
>>
>> A solid mongodb source would be Nice.
>>
>>
>> Definitely!
>>
>>
>> I wish the same for elasticsearch sink where we could specify the mapping
>> for the headers instead of sending everything as a string
>>
>>
>> We have a serializer that creates mappings for ElasticSearch [1]. It is
>> not ready for production [2] but it is one of our priorities.
>>
>> [1]
>> https://github.com/Stratio/stratio-ingestion/tree/develop/stratio-serializers/stratio-elasticsearch-serializer
>> [2] https://github.com/Stratio/stratio-ingestion/issues/21
>>
>> Thanks for your feedback,
>>
>> --
>>
>> Santiago M. Mola
>>
>>
>> <http://www.stratio.com/>
>> Avenida de Europa, 26. Ática 5. 3ª Planta
>> 28224 Pozuelo de Alarcón, Madrid
>> Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*
>>
>>
>>

Re: What Source/Sink would you want next?

Posted by Hari Shreedharan <hs...@cloudera.com>.
Strong +1. If there are new features or fixes you want to contribute - please do, I am trying to make sure I spend some time reviewing and committing. If you don’t get a review in a few days, please ping on the jira - I will look at it!


Thanks,
Hari

On Fri, Sep 26, 2014 at 11:07 AM, Arvind Prabhakar <ar...@apache.org>
wrote:

> (cross-posting this to dev@)
> While I do not speak for the availability of other committers of the
> project, I would like to spend sometime with the contributors to help
> identify what are the most important needs of the project, and see how best
> we can get those committed into the codebase.
> Santiago (and others who would like to contribute) - please go ahead and
> create the necessary Jiras if they do not exist already, and invite the
> community to vote on those. That way we can prioritize the review and
> commit for functionality that is aligned with community requirements.
> Regards,
> Arvind Prabhakar
> On Fri, Sep 26, 2014 at 5:13 AM, jean garutti <la...@yahoo.fr> wrote:
>> hi
>> This seems to be great.
>> I'll wait to have the 'production ready' flag for ELS mapping patch.
>> I think more effort should be done to have this sink more configurable
>> like what we can do with logstash.
>>
>> anyway it's nice to share your development to the community
>> i'd love to have the mongodb sink packaged in the official flume release.
>>
>> jean
>>
>>
>>   Le Jeudi 25 septembre 2014 9h48, Santiago Mola <sm...@stratio.com> a
>> écrit :
>>
>>
>> Hi Jean,
>>
>> 2014-09-24 22:44 GMT+02:00 Jean <la...@yahoo.fr>:
>>
>> A solid mongodb source would be Nice.
>>
>>
>> Definitely!
>>
>>
>> I wish the same for elasticsearch sink where we could specify the mapping
>> for the headers instead of sending everything as a string
>>
>>
>> We have a serializer that creates mappings for ElasticSearch [1]. It is
>> not ready for production [2] but it is one of our priorities.
>>
>> [1]
>> https://github.com/Stratio/stratio-ingestion/tree/develop/stratio-serializers/stratio-elasticsearch-serializer
>> [2] https://github.com/Stratio/stratio-ingestion/issues/21
>>
>> Thanks for your feedback,
>>
>> --
>>
>> Santiago M. Mola
>>
>>
>> <http://www.stratio.com/>
>> Avenida de Europa, 26. Ática 5. 3ª Planta
>> 28224 Pozuelo de Alarcón, Madrid
>> Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*
>>
>>
>>

Re: What Source/Sink would you want next?

Posted by Santiago Mola <sm...@stratio.com>.
Hi Arvind, hi Hari

2014-09-26 20:06 GMT+02:00 Arvind Prabhakar <ar...@apache.org>:

> Santiago (and others who would like to contribute) - please go ahead and
> create the necessary Jiras if they do not exist already, and invite the
> community to vote on those. That way we can prioritize the review and
> commit for functionality that is aligned with community requirements.
>


2014-09-26 20:10 GMT+02:00 Hari Shreedharan <hs...@cloudera.com>:

> Strong +1. If there are new features or fixes you want to contribute -
> please do, I am trying to make sure I spend some time reviewing and
> committing. If you don’t get a review in a few days, please ping on the
> jira - I will look at it!
>

Sure. I'll do. Thanks!

Best,
-- 

Santiago M. Mola


<http://www.stratio.com/>
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*

Re: What Source/Sink would you want next?

Posted by Santiago Mola <sm...@stratio.com>.
Hi Arvind, hi Hari

2014-09-26 20:06 GMT+02:00 Arvind Prabhakar <ar...@apache.org>:

> Santiago (and others who would like to contribute) - please go ahead and
> create the necessary Jiras if they do not exist already, and invite the
> community to vote on those. That way we can prioritize the review and
> commit for functionality that is aligned with community requirements.
>


2014-09-26 20:10 GMT+02:00 Hari Shreedharan <hs...@cloudera.com>:

> Strong +1. If there are new features or fixes you want to contribute -
> please do, I am trying to make sure I spend some time reviewing and
> committing. If you don’t get a review in a few days, please ping on the
> jira - I will look at it!
>

Sure. I'll do. Thanks!

Best,
-- 

Santiago M. Mola


<http://www.stratio.com/>
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*

Re: What Source/Sink would you want next?

Posted by Arvind Prabhakar <ar...@apache.org>.
(cross-posting this to dev@)

While I do not speak for the availability of other committers of the
project, I would like to spend sometime with the contributors to help
identify what are the most important needs of the project, and see how best
we can get those committed into the codebase.

Santiago (and others who would like to contribute) - please go ahead and
create the necessary Jiras if they do not exist already, and invite the
community to vote on those. That way we can prioritize the review and
commit for functionality that is aligned with community requirements.

Regards,
Arvind Prabhakar

On Fri, Sep 26, 2014 at 5:13 AM, jean garutti <la...@yahoo.fr> wrote:

> hi
> This seems to be great.
> I'll wait to have the 'production ready' flag for ELS mapping patch.
> I think more effort should be done to have this sink more configurable
> like what we can do with logstash.
>
> anyway it's nice to share your development to the community
> i'd love to have the mongodb sink packaged in the official flume release.
>
> jean
>
>
>   Le Jeudi 25 septembre 2014 9h48, Santiago Mola <sm...@stratio.com> a
> écrit :
>
>
> Hi Jean,
>
> 2014-09-24 22:44 GMT+02:00 Jean <la...@yahoo.fr>:
>
> A solid mongodb source would be Nice.
>
>
> Definitely!
>
>
> I wish the same for elasticsearch sink where we could specify the mapping
> for the headers instead of sending everything as a string
>
>
> We have a serializer that creates mappings for ElasticSearch [1]. It is
> not ready for production [2] but it is one of our priorities.
>
> [1]
> https://github.com/Stratio/stratio-ingestion/tree/develop/stratio-serializers/stratio-elasticsearch-serializer
> [2] https://github.com/Stratio/stratio-ingestion/issues/21
>
> Thanks for your feedback,
>
> --
>
> Santiago M. Mola
>
>
> <http://www.stratio.com/>
> Avenida de Europa, 26. Ática 5. 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*
>
>
>

Re: What Source/Sink would you want next?

Posted by Arvind Prabhakar <ar...@apache.org>.
(cross-posting this to dev@)

While I do not speak for the availability of other committers of the
project, I would like to spend sometime with the contributors to help
identify what are the most important needs of the project, and see how best
we can get those committed into the codebase.

Santiago (and others who would like to contribute) - please go ahead and
create the necessary Jiras if they do not exist already, and invite the
community to vote on those. That way we can prioritize the review and
commit for functionality that is aligned with community requirements.

Regards,
Arvind Prabhakar

On Fri, Sep 26, 2014 at 5:13 AM, jean garutti <la...@yahoo.fr> wrote:

> hi
> This seems to be great.
> I'll wait to have the 'production ready' flag for ELS mapping patch.
> I think more effort should be done to have this sink more configurable
> like what we can do with logstash.
>
> anyway it's nice to share your development to the community
> i'd love to have the mongodb sink packaged in the official flume release.
>
> jean
>
>
>   Le Jeudi 25 septembre 2014 9h48, Santiago Mola <sm...@stratio.com> a
> écrit :
>
>
> Hi Jean,
>
> 2014-09-24 22:44 GMT+02:00 Jean <la...@yahoo.fr>:
>
> A solid mongodb source would be Nice.
>
>
> Definitely!
>
>
> I wish the same for elasticsearch sink where we could specify the mapping
> for the headers instead of sending everything as a string
>
>
> We have a serializer that creates mappings for ElasticSearch [1]. It is
> not ready for production [2] but it is one of our priorities.
>
> [1]
> https://github.com/Stratio/stratio-ingestion/tree/develop/stratio-serializers/stratio-elasticsearch-serializer
> [2] https://github.com/Stratio/stratio-ingestion/issues/21
>
> Thanks for your feedback,
>
> --
>
> Santiago M. Mola
>
>
> <http://www.stratio.com/>
> Avenida de Europa, 26. Ática 5. 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*
>
>
>

Re: What Source/Sink would you want next?

Posted by jean garutti <la...@yahoo.fr>.
hi
This seems to be great.
I'll wait to have the 'production ready' flag for ELS mapping patch.
I think more effort should be done to have this sink more configurable like what we can do with logstash.

anyway it's nice to share your development to the community 

i'd love to have the mongodb sink packaged in the official flume release.


jean



Le Jeudi 25 septembre 2014 9h48, Santiago Mola <sm...@stratio.com> a écrit :
 


Hi Jean,



2014-09-24 22:44 GMT+02:00 Jean <la...@yahoo.fr>:

A solid mongodb source would be Nice.

Definitely!

 
I wish the same for elasticsearch sink where we could specify the mapping for the headers instead of sending everything as a string
>

We have a serializer that creates mappings for ElasticSearch [1]. It is not ready for production [2] but it is one of our priorities.

[1] https://github.com/Stratio/stratio-ingestion/tree/develop/stratio-serializers/stratio-elasticsearch-serializer
[2] https://github.com/Stratio/stratio-ingestion/issues/21

Thanks for your feedback,

-- 

Santiago M. Mola
>


Avenida de Europa, 26. Ática 5. 3ª Planta

28224 Pozuelo de Alarcón, Madrid

Tel: +34 91 352 59 42 // @stratiobd

Re: What Source/Sink would you want next?

Posted by Santiago Mola <sm...@stratio.com>.
Hi Jean,

2014-09-24 22:44 GMT+02:00 Jean <la...@yahoo.fr>:

> A solid mongodb source would be Nice.
>

Definitely!


> I wish the same for elasticsearch sink where we could specify the mapping
> for the headers instead of sending everything as a string
>

We have a serializer that creates mappings for ElasticSearch [1]. It is not
ready for production [2] but it is one of our priorities.

[1]
https://github.com/Stratio/stratio-ingestion/tree/develop/stratio-serializers/stratio-elasticsearch-serializer
[2] https://github.com/Stratio/stratio-ingestion/issues/21

Thanks for your feedback,
-- 

Santiago M. Mola


<http://www.stratio.com/>
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*

Re: What Source/Sink would you want next?

Posted by Jean <la...@yahoo.fr>.
A solid mongodb source would be Nice.
I wish the same for elasticsearch sink where we could specify the mapping for the headers instead of sending everything as a string

> Le 23 sept. 2014 à 17:31, Otis Gospodnetic <ot...@gmail.com> a écrit :
> 
> Hi Santiago,
> 
> Very nice.
> +1 for SNMP traps :)
> 
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
> 
> 
>> On Tue, Sep 23, 2014 at 9:14 AM, Santiago Mola <sm...@stratio.com> wrote:
>> Hi all,
>> 
>> I'm developer at Stratio, where I'm part of the development of Stratio Ingestion. [1] Stratio Ingestion is a distribution of Apache Flume "on steroids" with many extra sources, sinks, morphlines, etc. We would like to start contributing back to Apache Flume as much as we can. So we would like to know what's the community interest on each plugin so we can prioritize our efforts when contributing code to Apache Flume.
>> 
>> You can check all components at our GitHub page [1], but here's a summary:
>> 
>> Sinks:
>> 
>> - Cassandra (driver 2.0.2) supporting custom or automatic mapping
>> - MongoDB (driver 2.12) supporting custom or automatic mapping
>> - JDBC supporting custom queries or automatic mapping
>> - Kafka (0.8), now obsolete with Flume 1.6.0
>> - Stratio Streaming
>> 
>> Sources:
>> 
>> - Flume Statistics (consumes Flume agent statistics exposed through its monitoring REST API).
>> - Redis PubSub
>> - REST Client
>> - SNMP Traps (v1, v2c and v3)
>> 
>> Deserializers:
>> 
>> - XML XPath deserializer
>> 
>> 
>> Please, let me know if any of these components is useful to you. I'd love to hear further information about use cases and specific needs.
>> 
>> Thank you.
>> 
>> Best,
>> -- 
>> Santiago M. Mola
>> 
>> 
>> Avenida de Europa, 26. Ática 5. 3ª Planta
>> 28224 Pozuelo de Alarcón, Madrid
>> Tel: +34 91 352 59 42 // @stratiobd
> 

Re: What Source/Sink would you want next?

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi Santiago,

Very nice.
+1 for SNMP traps :)

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Tue, Sep 23, 2014 at 9:14 AM, Santiago Mola <sm...@stratio.com> wrote:

> Hi all,
>
> I'm developer at Stratio, where I'm part of the development of Stratio
> Ingestion. [1] Stratio Ingestion is a distribution of Apache Flume "on
> steroids" with many extra sources, sinks, morphlines, etc. We would like to
> start contributing back to Apache Flume as much as we can. So we would like
> to know what's the community interest on each plugin so we can prioritize
> our efforts when contributing code to Apache Flume.
>
> You can check all components at our GitHub page [1], but here's a summary:
>
> Sinks:
>
> - Cassandra (driver 2.0.2) supporting custom or automatic mapping
> - MongoDB (driver 2.12) supporting custom or automatic mapping
> - JDBC supporting custom queries or automatic mapping
> - Kafka (0.8), now obsolete with Flume 1.6.0
> - Stratio Streaming
>
> Sources:
>
> - Flume Statistics (consumes Flume agent statistics exposed through its
> monitoring REST API).
> - Redis PubSub
> - REST Client
> - SNMP Traps (v1, v2c and v3)
>
> Deserializers:
>
> - XML XPath deserializer
>
>
> Please, let me know if any of these components is useful to you. I'd love
> to hear further information about use cases and specific needs.
>
> Thank you.
>
> Best,
> --
>
> Santiago M. Mola
>
>
> <http://www.stratio.com/>
> Avenida de Europa, 26. Ática 5. 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*
>

Re: What Source/Sink would you want next?

Posted by Santiago Mola <sm...@stratio.com>.
Hi Jeremy

2014-09-26 1:21 GMT+02:00 Jeremy Karlson <je...@gmail.com>:

>
> I believe my patch was inspiration for your JDBC sink.  I'm glad it was
> helpful.
>

Yes. We used your approach to query templates, which is perfect for us.
Thank you!


We had a need for JDBC sinks, and I've since heard from others who had the
> same need.  I started work on that generic one, but I got the impression
> that it wasn't wanted, for some reasons valid and invalid.  It eventually
> got lost in a holding pattern and I gave up pursuing it.
>

It might have been not prioritary or overlooked, but I'm sure there's a
place for JDBC integration in Apache Flume.


While I applaud your efforts to meet community needs, I think you also need
> to ask "What will Flume accept?" before you spend any sizeable amount of
> time on it.
>

We are not too worried about that. We do not use Apache Flume packages
ourselves, but our own distribution (Stratio Ingestion) [1]. Of course,
we'd love to converge as much as possible with Apache Flume and do our best
to contribute, but anything that is not accepted will still be part of our
Flume distribution if we need it.


If you do submit the JDBC sink, I would be happy to do what I can to help
> it along.
>

Thanks! I'm working on an improved version of the JDBC sink that meets
Apache Flume conventions, it'll need wider testing soon.


[1] https://github.com/Stratio/stratio-ingestion

Best,
-- 

Santiago M. Mola


<http://www.stratio.com/>
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*

Re: What Source/Sink would you want next?

Posted by Jeremy Karlson <je...@gmail.com>.
Hey Santiago,

I believe my patch was inspiration for your JDBC sink.  I'm glad it was
helpful.

We had a need for JDBC sinks, and I've since heard from others who had the
same need.  I started work on that generic one, but I got the impression
that it wasn't wanted, for some reasons valid and invalid.  It eventually
got lost in a holding pattern and I gave up pursuing it.

While I applaud your efforts to meet community needs, I think you also need
to ask "What will Flume accept?" before you spend any sizeable amount of
time on it.

If you do submit the JDBC sink, I would be happy to do what I can to help
it along.
-- Jeremy


On Tue, Sep 23, 2014 at 6:14 AM, Santiago Mola <sm...@stratio.com> wrote:

> Hi all,
>
> I'm developer at Stratio, where I'm part of the development of Stratio
> Ingestion. [1] Stratio Ingestion is a distribution of Apache Flume "on
> steroids" with many extra sources, sinks, morphlines, etc. We would like to
> start contributing back to Apache Flume as much as we can. So we would like
> to know what's the community interest on each plugin so we can prioritize
> our efforts when contributing code to Apache Flume.
>
> You can check all components at our GitHub page [1], but here's a summary:
>
> Sinks:
>
> - Cassandra (driver 2.0.2) supporting custom or automatic mapping
> - MongoDB (driver 2.12) supporting custom or automatic mapping
> - JDBC supporting custom queries or automatic mapping
> - Kafka (0.8), now obsolete with Flume 1.6.0
> - Stratio Streaming
>
> Sources:
>
> - Flume Statistics (consumes Flume agent statistics exposed through its
> monitoring REST API).
> - Redis PubSub
> - REST Client
> - SNMP Traps (v1, v2c and v3)
>
> Deserializers:
>
> - XML XPath deserializer
>
>
> Please, let me know if any of these components is useful to you. I'd love
> to hear further information about use cases and specific needs.
>
> Thank you.
>
> Best,
> --
>
> Santiago M. Mola
>
>
> <http://www.stratio.com/>
> Avenida de Europa, 26. Ática 5. 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*
>